Librerías¶

In [1]:
!pip install seaborn plotly scikit-learn xgboost
import pandas as pd
from sklearn.linear_model import Ridge
import xgboost as xgb
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import plotly.express as px
from sklearn.compose import ColumnTransformer 
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV, cross_val_score, KFold
from sklearn.decomposition import PCA
from sklearn.neural_network import MLPRegressor
from sklearn.ensemble import AdaBoostRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.metrics import mean_absolute_percentage_error 
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.ensemble import RandomForestRegressor
from sklearn.neural_network import MLPRegressor
import warnings 
warnings.filterwarnings('ignore')
Requirement already satisfied: seaborn in c:\users\admin\appdata\local\programs\python\python313\lib\site-packages (0.13.2)
Requirement already satisfied: plotly in c:\users\admin\appdata\local\programs\python\python313\lib\site-packages (6.2.0)
Requirement already satisfied: scikit-learn in c:\users\admin\appdata\local\programs\python\python313\lib\site-packages (1.7.1)
Requirement already satisfied: xgboost in c:\users\admin\appdata\local\programs\python\python313\lib\site-packages (3.0.3)
Requirement already satisfied: numpy!=1.24.0,>=1.20 in c:\users\admin\appdata\local\programs\python\python313\lib\site-packages (from seaborn) (2.2.4)
Requirement already satisfied: pandas>=1.2 in c:\users\admin\appdata\local\programs\python\python313\lib\site-packages (from seaborn) (2.2.3)
Requirement already satisfied: matplotlib!=3.6.1,>=3.4 in c:\users\admin\appdata\local\programs\python\python313\lib\site-packages (from seaborn) (3.10.3)
Requirement already satisfied: narwhals>=1.15.1 in c:\users\admin\appdata\local\programs\python\python313\lib\site-packages (from plotly) (2.0.1)
Requirement already satisfied: packaging in c:\users\admin\appdata\local\programs\python\python313\lib\site-packages (from plotly) (24.2)
Requirement already satisfied: scipy>=1.8.0 in c:\users\admin\appdata\local\programs\python\python313\lib\site-packages (from scikit-learn) (1.16.1)
Requirement already satisfied: joblib>=1.2.0 in c:\users\admin\appdata\local\programs\python\python313\lib\site-packages (from scikit-learn) (1.4.2)
Requirement already satisfied: threadpoolctl>=3.1.0 in c:\users\admin\appdata\local\programs\python\python313\lib\site-packages (from scikit-learn) (3.6.0)
Requirement already satisfied: contourpy>=1.0.1 in c:\users\admin\appdata\local\programs\python\python313\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.3.3)
Requirement already satisfied: cycler>=0.10 in c:\users\admin\appdata\local\programs\python\python313\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in c:\users\admin\appdata\local\programs\python\python313\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (4.59.0)
Requirement already satisfied: kiwisolver>=1.3.1 in c:\users\admin\appdata\local\programs\python\python313\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.4.8)
Requirement already satisfied: pillow>=8 in c:\users\admin\appdata\local\programs\python\python313\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (10.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in c:\users\admin\appdata\local\programs\python\python313\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (3.2.3)
Requirement already satisfied: python-dateutil>=2.7 in c:\users\admin\appdata\local\programs\python\python313\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in c:\users\admin\appdata\local\programs\python\python313\lib\site-packages (from pandas>=1.2->seaborn) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in c:\users\admin\appdata\local\programs\python\python313\lib\site-packages (from pandas>=1.2->seaborn) (2025.2)
Requirement already satisfied: six>=1.5 in c:\users\admin\appdata\local\programs\python\python313\lib\site-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.4->seaborn) (1.17.0)
[notice] A new release of pip is available: 24.3.1 -> 25.2
[notice] To update, run: C:\Users\Admin\AppData\Local\Programs\Python\Python313\python.exe -m pip install --upgrade pip

Lectura del dataset¶

In [2]:
data = pd.read_csv(r"C:\Users\Admin\OneDrive\Escritorio\ML_Bancolombia_Leader_Test\Punto_1\restaurants_dataset.csv")
print(f"Dimensión del dataset: {data.shape}  (filas, columnas)\n")
print("Tipos de datos por columna:\n", data.dtypes, "\n")
data.head()
Dimensión del dataset: (3493, 34)  (filas, columnas)

Tipos de datos por columna:
 Registration Number                int64
Annual Turnover                    int64
Cuisine                           object
City                              object
Restaurant Location               object
Opening Day of Restaurant         object
Facebook Popularity Quotient     float64
Endorsed By                       object
Instagram Popularity Quotient    float64
Fire Audit                         int64
Liquor License Obtained            int64
Situated in a Multi Complex        int64
Dedicated Parking                  int64
Open Sitting Available             int64
Resturant Tier                   float64
Restaurant Type                   object
Restaurant Theme                  object
Restaurant Zomato Rating           int64
Restaurant City Tier               int64
Order Wait Time                    int64
Staff Responsivness                int64
Value for Money                    int64
Hygiene Rating                     int64
Food Rating                        int64
Overall Restaurant Rating        float64
Live Music Rating                float64
Comedy Gigs Rating               float64
Value Deals Rating               float64
Live Sports Rating               float64
Ambience                         float64
Lively                             int64
Service                            int64
Comfortablility                    int64
Privacy                            int64
dtype: object 

Out[2]:
Registration Number Annual Turnover Cuisine City Restaurant Location Opening Day of Restaurant Facebook Popularity Quotient Endorsed By Instagram Popularity Quotient Fire Audit ... Overall Restaurant Rating Live Music Rating Comedy Gigs Rating Value Deals Rating Live Sports Rating Ambience Lively Service Comfortablility Privacy
0 60001 42000000 indian,irish Bangalore Near Business Hub 14-02-2009 84.3 Not Specific 95.8 1 ... 10.0 4.0 NaN NaN NaN 8.0 8 6 6 6
1 60002 50000000 indian,irish Indore Near Party Hub 29-09-2008 85.4 Tier A Celebrity 85.0 1 ... 9.0 NaN 4.0 NaN NaN 5.0 7 7 3 8
2 60003 32500000 tibetan,italian Chennai Near Business Hub 30-07-2011 85.0 Tier A Celebrity 68.2 1 ... 8.0 3.0 NaN NaN NaN 7.0 10 5 2 8
3 60004 110000000 turkish,nigerian Gurgaon Near Party Hub 30-11-2008 85.6 Tier A Celebrity 83.6 0 ... 9.0 6.0 NaN NaN NaN 7.0 7 4 3 5
4 60005 20000000 irish,belgian Manesar Near Party Hub 22-02-2010 NaN Tier A Celebrity 76.8 1 ... 6.0 NaN 2.0 NaN NaN NaN 6 2 4 6

5 rows × 34 columns

Valores nulos¶

In [3]:
missing_pct = data.isnull().mean() * 100
missing_pct = missing_pct[missing_pct > 0].sort_values(ascending=False)

# Mostrar tabla de porcentajes de nulos
print("Porcentaje de valores nulos por columna:\n")
display(missing_pct.to_frame(name='% Nulos'))

# Gráfico de barras
plt.figure(figsize=(8, 5))
sns.barplot(
    x=missing_pct.values,
    y=missing_pct.index,
    palette="viridis"
)
plt.title("Porcentaje de valores nulos por columna")
plt.xlabel("% de valores faltantes")
plt.ylabel("Columnas con nulos")
plt.tight_layout()
plt.show()
Porcentaje de valores nulos por columna:

% Nulos
Live Sports Rating 94.131119
Value Deals Rating 77.497853
Comedy Gigs Rating 71.085027
Live Music Rating 21.900945
Overall Restaurant Rating 6.069281
Facebook Popularity Quotient 2.834240
Instagram Popularity Quotient 1.603206
Resturant Tier 1.402806
Ambience 0.715717
No description has been provided for this image
In [4]:
num_cols = data.select_dtypes(include=['int64','float64']).columns.tolist()
print("Estadísticas descriptivas de variables numéricas:\n")
display(data[num_cols].describe().T)
Estadísticas descriptivas de variables numéricas:

count mean std min 25% 50% 75% max
Registration Number 3493.0 6.174700e+04 1.008487e+03 60001.0 60874.0 61747.00 6.262000e+04 6.349300e+04
Annual Turnover 3493.0 3.072571e+07 2.165125e+07 3500000.0 18000000.0 30000000.00 3.700000e+07 4.000000e+08
Facebook Popularity Quotient 3394.0 7.793872e+01 9.829169e+00 43.0 72.0 79.00 8.574500e+01 9.776000e+01
Instagram Popularity Quotient 3437.0 7.440468e+01 1.094033e+01 40.0 66.0 74.05 8.240000e+01 9.870000e+01
Fire Audit 3493.0 7.887203e-01 4.082748e-01 0.0 1.0 1.00 1.000000e+00 1.000000e+00
Liquor License Obtained 3493.0 9.882622e-01 1.077187e-01 0.0 1.0 1.00 1.000000e+00 1.000000e+00
Situated in a Multi Complex 3493.0 8.081878e-01 3.937825e-01 0.0 1.0 1.00 1.000000e+00 1.000000e+00
Dedicated Parking 3493.0 8.018895e-01 3.986329e-01 0.0 1.0 1.00 1.000000e+00 1.000000e+00
Open Sitting Available 3493.0 8.001718e-01 3.999284e-01 0.0 1.0 1.00 1.000000e+00 1.000000e+00
Resturant Tier 3444.0 1.926539e+00 2.609297e-01 1.0 2.0 2.00 2.000000e+00 2.000000e+00
Restaurant Zomato Rating 3493.0 2.696536e+00 7.872713e-01 0.0 2.0 3.00 3.000000e+00 5.000000e+00
Restaurant City Tier 3493.0 3.014601e-01 4.589577e-01 0.0 0.0 0.00 1.000000e+00 1.000000e+00
Order Wait Time 3493.0 5.509591e+00 2.854476e+00 1.0 3.0 5.00 8.000000e+00 1.000000e+01
Staff Responsivness 3493.0 4.538506e+00 1.093832e+00 1.0 4.0 5.00 5.000000e+00 8.000000e+00
Value for Money 3493.0 4.526482e+00 9.139370e-01 1.0 4.0 5.00 5.000000e+00 7.000000e+00
Hygiene Rating 3493.0 4.661895e+00 1.259523e+00 1.0 4.0 5.00 6.000000e+00 9.000000e+00
Food Rating 3493.0 7.522760e+00 1.722721e+00 5.0 6.0 7.00 9.000000e+00 1.000000e+01
Overall Restaurant Rating 3281.0 8.479427e+00 1.287233e+00 6.0 7.0 9.00 1.000000e+01 1.000000e+01
Live Music Rating 2728.0 4.012830e+00 1.009044e+00 1.0 3.0 4.00 5.000000e+00 8.000000e+00
Comedy Gigs Rating 1010.0 2.932673e+00 8.595190e-01 1.0 2.0 3.00 3.000000e+00 6.000000e+00
Value Deals Rating 786.0 3.655216e+00 9.732496e-01 1.0 3.0 4.00 4.000000e+00 7.000000e+00
Live Sports Rating 205.0 3.590244e+00 9.063515e-01 2.0 3.0 4.00 4.000000e+00 6.000000e+00
Ambience 3468.0 6.423010e+00 2.050026e+00 0.0 5.0 7.00 8.000000e+00 1.000000e+01
Lively 3493.0 6.874893e+00 1.847131e+00 0.0 6.0 7.00 8.000000e+00 1.000000e+01
Service 3493.0 4.546808e+00 1.877063e+00 0.0 3.0 5.00 6.000000e+00 1.000000e+01
Comfortablility 3493.0 3.231320e+00 1.993050e+00 0.0 2.0 3.00 5.000000e+00 1.000000e+01
Privacy 3493.0 6.275122e+00 1.895057e+00 0.0 5.0 6.00 8.000000e+00 1.000000e+01

De la tabla anterior podemos concluir que:

  • Annual Turnover: Tiene una media de 30.7 Millones y una mediana de 30 Millones, esto muestra que hay una cercanía relativa de entre la media y la mediana, sin embargo la media es un poco mayor a la mediana por lo que nuestra variable tendrá un sesgo a al derecha. Además tendrá una cola de valores muy altos, posiblemente outliers.
  • El annual turnover tiene una rango amplio, es decir, sus valores van desde 3.5 Millones a 400 millones, con una desviación estandar de 21.65 Millones, esto quiere decir que hay bastante dispersión en los datos.
  • Por la parte de popularidad en redes (Facebook e Instagram), tienen medias similares Facebook 77.9 e instagram 74.4, también se puede visualizar que para facebook la mayoría de los datos están entre los 72 y los 97. Y para instagram los valores estan entre los 66 y 98 puntos. Lo que nos indica que la mayoría de los restaurantes tienen buena popularidad.
  • El promedio de la calificación de la comida (Food Rating) es 7.52, lo que indica que hy muchos restaurantes con buena comida. La mayoría va de 6 a 10 puntos.
  • Para las variables de Ambience, Lively, Service, Comfortability y privacy tienen un promedio de alrededor de 6 para Ambience, Lively y Privacy. Por otro lado, Service y Comfortability, tiene una calificación promedio de 4.5 y 3.23. Lo que indica que la mayoría de los restaurantes tienen una calificación media. Encontraremos algunas colas con restaurantes con calificaciones latas y otros con calificaciones bajas.
In [5]:
plt.figure(figsize=(14, 4))

plt.subplot(1, 3, 1)
sns.histplot(data['Annual Turnover'], kde=True)
plt.title("Distribución de Facturación Anual")
plt.xlabel("Annual Turnover")

plt.subplot(1, 3, 2)
sns.histplot(data['Instagram Popularity Quotient'], kde=True, color='coral')
plt.title("Distribución de Popularidad en Instagram")
plt.xlabel("Instagram Popularity Quotient")

plt.subplot(1, 3, 3)
sns.histplot(data['Facebook Popularity Quotient'], kde=True, color='seagreen')
plt.title("Distribución de Popularidad en Facebook")
plt.xlabel("Facebook Popularity Quotient")

plt.tight_layout()
plt.show()
No description has been provided for this image
In [6]:
plt.figure(figsize=(14, 4))

plt.subplot(1, 5, 1)
sns.histplot(data['Ambience'], kde=True)
plt.title("Distribución de Ambiente")
plt.xlabel("Ambience")

plt.subplot(1, 5, 2)
sns.histplot(data['Lively'], kde=True, color='coral')
plt.title("Distribución de Lively")
plt.xlabel("Lively")

plt.subplot(1, 5, 3)
sns.histplot(data['Service'], kde=True, color='seagreen')
plt.title("Distribución de Service")
plt.xlabel("Service")

plt.subplot(1, 5, 4)
sns.histplot(data['Comfortablility'], kde=True, color='green')
plt.title("Distribución de Comfortablility")
plt.xlabel("Comfortablility")

plt.subplot(1, 5, 5)
sns.histplot(data['Privacy'], kde=True, color='yellow')
plt.title("Distribución de Privacy")
plt.xlabel("Privacy")

plt.tight_layout()
plt.show()
No description has been provided for this image
In [7]:
data['Live Sports Rating_missing'] = data['Live Sports Rating'].fillna(data['Live Sports Rating'].median())
data['Value Deals Rating_missing'] = data['Value Deals Rating'].fillna(data['Value Deals Rating'].median())
data['Comedy Gigs Rating_missing'] = data['Comedy Gigs Rating'].fillna(data['Comedy Gigs Rating'].median())
data['Live Music Rating_missing'] = data['Live Music Rating'].fillna(data['Live Music Rating'].median())
In [8]:
from matplotlib.ticker import FuncFormatter
def millions(x, pos):
    """Formatea el tick x como '10M', '50M', etc."""
    return f'{int(x/1e6)}M'
    
metrics = ['Live Music Rating','Live Sports Rating', 'Value Deals Rating', 'Comedy Gigs Rating']
colors = ['#264653', '#2a9d8f', '#e9c46a', '#e9c46a']

plt.figure(figsize=(15, 5))

for i, (metric, color) in enumerate(zip(metrics, colors), 1):
    ax = plt.subplot(2, 2, i)
    sns.boxplot(
        data=data,
        x=metric,
        y='Annual Turnover',
        color=color
    )
    ax.set_yscale('log')
    ax.yaxis.set_major_formatter(FuncFormatter(millions))
    ax.set_xlabel(metric)
    if i == 1:
        ax.set_ylabel('Annual Turnover')
    else:
        ax.set_ylabel('')
    ax.set_title(f'Turnover vs {metric}')

plt.suptitle('Facturación Anual según Ratings', y=1.02)
plt.tight_layout()
plt.show()
No description has been provided for this image
  • A medida que sube el rating de música en vivo (de 1 a 8), la mediana de facturación anual tiende a crecer.
  • Similarmente, un mejor rating de transmisiones deportivas va ligado a medianas crecientes en la facturación anual.
  • Hay algo particular con las promociones o el rating de los deals, pues alcanzan un tope hasta 4 de crecimeinto y luego comienza a reducirse.
In [9]:
import plotly.io as pio
pio.renderers.default = "notebook" 
top_cities = data['City'].value_counts().nlargest(20).reset_index()
top_cities.columns = ['City', 'Count']
fig1 = px.bar(
    top_cities,
    x='Count',
    y='City',
    orientation='h',
    title='Top 20 Ciudades con más Restaurantes',
    labels={'Count': 'Cantidad de Restaurantes', 'City': 'Ciudad'}
)
fig1.update_layout(
    yaxis=dict(categoryorder='total ascending'),
    margin=dict(l=100, r=20, t=50, b=50)
)
fig1.show()

En la gráfica anterior, se pueden visualizar las 10 ciudades con mayor número de restaurantes, podemos visualizar algo extraño hay un -1 en la categoría de ciudades, probablemente debe de ser como un "NULL" y lo vamos a limipiar más adelante.

In [10]:
group = (
    data
    .groupby(['City', 'Cuisine'])['Annual Turnover']
    .sum()
    .reset_index()
)

top_cities = (
    data
    .groupby('City')['Annual Turnover']
    .sum()
    .nlargest(10)
    .index
)

group_top = group[group['City'].isin(top_cities)]

pivot = (
    group_top
    .pivot(index='City', columns='Cuisine', values='Annual Turnover')
    .fillna(0)
    .loc[top_cities]
)

plt.figure(figsize=(12, 8))
pivot.plot(
    kind='barh',
    stacked=True,
    width=0.8,
    figsize=(12,8),
    legend=True
)

plt.title('Facturación Anual por Ciudad y Tipo de Cocina\n(Top 10 Ciudades)')
plt.xlabel('Facturación Anual')
plt.ylabel('Ciudad')
plt.gca().invert_yaxis()

plt.legend(
    title='Cocina',
    bbox_to_anchor=(1.02, 1),
    loc='upper left'
)

plt.tight_layout()
plt.show()
<Figure size 1200x800 with 0 Axes>
No description has been provided for this image
In [11]:
freq_df = (
    data['City']
    .value_counts()
    .rename_axis('value')
    .reset_index(name='count')
)
freq_df[freq_df['count']>20]
Out[11]:
value count
0 Bangalore 553
1 -1 396
2 Noida 324
3 Hyderabad 295
4 Pune 262
5 Chennai 244
6 New Delhi 176
7 Gurgaon 174
8 Mumbai 90
9 Kolkata 88
10 Jaipur 36
11 Mysore 32
12 Lucknow 28
13 chennai 27
14 Greater Noida 24
15 pune 23
16 Navi Mumbai 22
17 Indore 21
18 Chandigarh 21

Hay muchas ciudades que aparecen pocas veces en el dataset, podrían no aportar mucha información así que se más adelante se hará una reducción del número de ciudades

In [12]:
prop = pd.crosstab(
    data['Resturant Tier'],
    data['Endorsed By'],
    normalize='index'
)

plt.figure(figsize=(10,6))
prop.plot(
    kind='bar',
    stacked=True,
    width=0.8,
    figsize=(10,6)
)

plt.title('Proporción de Endorsed by por Restaurant Tier')
plt.xlabel('Restaurant Tier')
plt.ylabel('Proporción (100%)')
plt.legend(
    title='Endorsed by',
    bbox_to_anchor=(1.02,1),
    loc='upper left'
)
plt.xticks(rotation=0)
plt.tight_layout()
plt.show()
<Figure size 1000x600 with 0 Axes>
No description has been provided for this image

Del gráfico anterior podemos concluir que los restaurantes de Tier 1.0 llaman mucho más la atención de las celebridades. Mientras que los restaurantes de Tier 2, tal vez por la falta de recursos optan por avales de otro tipo de público.

In [13]:
plt.figure(figsize=(6, 4))
sns.countplot(
    data=data,
    y='Restaurant Location',
    order=data['Restaurant Location'].value_counts().index,
    palette='viridis'
)
plt.title('Número de Restaurantes por Ubicación')
plt.xlabel('Cantidad de Restaurantes')
plt.ylabel('Ubicación')
plt.tight_layout()
plt.show()
No description has been provided for this image
In [14]:
num_cols = data.select_dtypes(include=['int64','float64']).columns.tolist()
if 'Registration Number' in num_cols:
    num_cols.remove('Registration Number')

corr = data[num_cols].corr()

mask = np.triu(np.ones_like(corr, dtype=bool))

plt.figure(figsize=(12, 10))
sns.heatmap(
    corr,
    mask=mask,
    cmap='coolwarm',
    annot=True,
    fmt=".2f",
    linewidths=0.5,
    cbar_kws={"shrink": .7},
    square=True
)
plt.title('Matriz de Correlación de Variables Numéricas')
plt.tight_layout()
plt.show()
No description has been provided for this image
In [15]:
from matplotlib.ticker import FuncFormatter
def millions(x, pos):
    """Formatea el tick x como '10M', '50M', etc."""
    return f'{int(x/1e6)}M'

plt.figure(figsize=(8, 5))
ax = sns.boxplot(
    data=data,
    x='Restaurant Location',
    y='Annual Turnover',
    palette=['#2a9d8f','#e76f51']
)

ax.set_yscale('log')
ax.yaxis.set_major_formatter(FuncFormatter(millions))

plt.title('Distribución de Facturación Anual por Ubicación')
plt.xlabel('Ubicación del Restaurante')
plt.ylabel('Annual Turnover')
plt.tight_layout()
plt.show()
No description has been provided for this image
  • Ambas categorías tiene medianas bastante similares. Sin embargo, por parte de los restaurantes cercanos a los Hub de fiesta tiene una mediana un poco mayor. El rango intercuartílico para ambos es bastante similar, rondando alrededor de 20 Millones a 40 Millones.
  • Por el lado de los outliers, para los lugares cercanos a Hubs de fiestas hay mayor cantidad de outliers, lo que indica que los lugares cercanos a Party Hub pueden tener mayor facturación con mayor frecuencia. Esto también lo podemos ver por el lado de los bigotes, el extremo de Near Party Hub, es un poco mayor comparado con Near Business Hub.
  • Esto puede indicar que la ubicación por sí sola no cambia drásticamente la facturación típica, aunque los Party Hubs tienen un poco más de casos de éxitos o de mayor facturación.
In [16]:
plt.figure(figsize=(8, 5))
sns.regplot(
    data=data,
    x='Instagram Popularity Quotient',
    y='Annual Turnover',
    scatter_kws={'alpha':0.4, 's':30},
    line_kws={'color':'crimson'}
)
plt.yscale('log') 
plt.title('Relación entre Popularidad en Instagram y Facturación Anual')
plt.xlabel('Instagram Popularity Quotient')
plt.ylabel('Annual Turnover (escala log)')
plt.tight_layout()
plt.show()
No description has been provided for this image

De la gráfica anterior podemos concluir que hay una relación positiva entre la popularidad en instagram y la facturación Anual. Sin embargo, debido a la dispersión de los puntos podemos decir que la relación es débil, pues hay restaurantes con popularidades similares pero con facturaciones diferentes.

In [17]:
metrics = ['Staff Responsivness', 'Value for Money', 'Hygiene Rating']
colors = ['#264653', '#2a9d8f', '#e9c46a']

plt.figure(figsize=(15, 5))

for i, (metric, color) in enumerate(zip(metrics, colors), 1):
    ax = plt.subplot(1, 3, i)
    sns.boxplot(
        data=data,
        x=metric,
        y='Annual Turnover',
        color=color  # aquí asignamos el color directamente
    )
    ax.set_yscale('log')
    ax.yaxis.set_major_formatter(FuncFormatter(millions))
    ax.set_xlabel(metric)
    if i == 1:
        ax.set_ylabel('Annual Turnover')
    else:
        ax.set_ylabel('')
    ax.set_title(f'Turnover vs {metric}')

plt.suptitle('Facturación Anual según Staff, Valor y Higiene', y=1.02)
plt.tight_layout()
plt.show()
No description has been provided for this image
  • Staff Responsiviness:

  • Los restaurantes con calificación de 2 a 3 tienen una facturación alrededor de los 15 a 17 Millones.

  • En los valores intermedios (calificación de 4 a 5) tiene facturaciones más altas entre los 25 y 35 millones.

  • Lo mejores rangos de facturación se pueden apreciar en entre los 6 y los 8 puntos. Por lo cual, podemos decir que un mejor servicio es sinónimo de ingresos más alto y esto es mucho más visible en los rangos más altos como las puntuaciones de 6, 7 y 8.

  • Value for money:

  • Tiene un comportamiento similar con Staff Responsiviness, pero el aumento de la facturación es mucho más evidente en valores altos de puntuación. El aumento de la facturación es muy marcado para puntuaciones bajas y puntuaciones altas. Para puntuaciones bajas la factuarción puede ser menor a 15 millones pero para puntuaciones altas la facturación mejora notablemente.

  • Entre mejor valor perciben los clientes, mayor facturación va a haber.

  • Hygiene:

  • En esta parte creo que es bastante marcado que si hay un excelente higine la facturación va a aumentar drásticamente y esto se puede visualizar en los puntos más altos de calificación (7,8 y 9) y por otro lado restaurantes con calificaciones bajas la facturaciones son notablemente bajas, menor a apróximandamente a 15 Millones.

In [18]:
plt.figure(figsize=(12, 5))

# Resturant Tier
ax1 = plt.subplot(1, 2, 1)
sns.boxplot(
    data=data,
    x='Resturant Tier',
    y='Annual Turnover',
    palette='Set2'
)
ax1.set_yscale('log')
ax1.yaxis.set_major_formatter(FuncFormatter(millions))
ax1.set_title('Turnover vs Resturant Tier')
ax1.set_xlabel('Resturant Tier')
ax1.set_ylabel('Annual Turnover')
Out[18]:
Text(0, 0.5, 'Annual Turnover')
No description has been provided for this image
In [19]:
means = data.groupby('Resturant Tier')['Annual Turnover'].mean()

ax = means.plot(
    kind='bar',
    logy=True,
    figsize=(6, 4),
    color=['#2a9d8f', '#e76f51']
)

ax.yaxis.set_major_formatter(lambda x, _: f"{int(x/1e6)}M")

plt.title('Media de Facturación Anual por Resturant Tier')
plt.xlabel('Resturant Tier')
plt.ylabel('Media Annual Turnover')
plt.tight_layout()
plt.show()
No description has been provided for this image

Del boxplot podemos concluir:

  • La media de los restaurantes Tier 1 esta en aproximadamente 45 millones mientras que la de Tier 2.0 esta en alrededor de 27 a 30 millones. Esto lo podemos confirmar con el bar plot que los restaurantes Tier 1 tienen mayor facturación que los restaurantes Tier 2.
  • Esto confirma que la categoría del establecimiento es un factor determinante para alcanzar una facturación anual más alta.
In [20]:
def fmt_millions(x, pos):
    return f'{int(x/1e6)}M'

plt.figure(figsize=(14, 5))

plt.subplot(1, 2, 1)
sns.regplot(
    data=data,
    x='Restaurant Zomato Rating',
    y='Annual Turnover',
    scatter_kws={'alpha':0.4, 's':30},
    line_kws={'color':'teal'}
)
plt.yscale('log')
plt.gca().yaxis.set_major_formatter(FuncFormatter(fmt_millions))
plt.title('Turnover vs Zomato Rating (log scale)')
plt.xlabel('Restaurant Zomato Rating')
plt.ylabel('Annual Turnover')
Out[20]:
Text(0, 0.5, 'Annual Turnover')
No description has been provided for this image
In [21]:
plt.subplot(1, 2, 2)
sns.boxplot(
    data=data,
    x='Restaurant Zomato Rating',
    y='Annual Turnover',
    color='lightcoral'
)
plt.yscale('log')
plt.gca().yaxis.set_major_formatter(FuncFormatter(fmt_millions))
plt.title('Distribución de Turnover por Zomato Rating')
plt.xlabel('Restaurant Zomato Rating')
plt.ylabel('')

plt.tight_layout()
plt.show()
No description has been provided for this image
  • La línea de regresión del scatterplot muestra que, a medida que aumenta el Restaurant Zomato Rating de 0 a 5, la facturación anual tiende a subir.
  • Hay un punto extraño con la calificación 0, que tiene una facturación por encima de 15 millones debido a que no tiene muchos samples o muestras pertenecientes a este valor.
  • En todos los niveles de calificación hay outliers con facturaciones muy elevadas lo que podría indicar que restaurantes con facturaciones bajas o medias podrían tener altas facturaciones. Sin embargo, la tendencia es a la alza.
In [22]:
plt.figure(figsize=(8, 5))
ax = sns.violinplot(
    data=data,
    x='Endorsed By',
    y='Annual Turnover',
    order=['Not Specific', 'Tier A Celebrity', 'Local Celebrity'],
    palette=['lightgray','skyblue','salmon'],
    scale='count',
    inner='quartile',
)
ax.set_yscale('log')
ax.yaxis.set_major_formatter(FuncFormatter(fmt_millions))
ax.set_title('Distribución de Turnover por Endorsed By')
ax.set_xlabel('Endorsed By')
ax.set_ylabel('Annual Turnover')
plt.tight_layout()
plt.show()
No description has been provided for this image
  • Not Specific:
  • Esta categoría tiene la mediana más baja de todos.
  • La distribución es más estrecha por loq ue no hay tantos valores extremos o outliers.
  • Restaurantes sin respaldo tiene menor variabilidad y además menor facturación.
  • Tier A Celerity:
  • Esta categoría tiene una mediana más alta que la de Not Specific lo que indica que que los ingresos anuales aumentan.
  • Local Celebrity:
  • Tiene la mediana más alta de todas, lo que indica que restaurantes con respaldo de celebridades locales pueden tener mayor facturación comparados con los demás.
  • En conclusión los restaurantes respaldados por celebridades tendrán uan facturación más alta.
In [23]:
data_copy = data.copy()
cuisine_dummies = data_copy['Cuisine'].str.get_dummies(sep=',')
df_c = pd.concat([data_copy, cuisine_dummies], axis=1)
df_c['Num_Cuisines'] = data_copy['Cuisine'].str.count(',').fillna(0).astype(int) + 1
top_cuisines = cuisine_dummies.sum().sort_values(ascending=False).head(10)
print("Top 10 cocinas más frecuentes:")
display(top_cuisines.to_frame(name='Count'))
plt.figure(figsize=(8, 5))
sns.barplot(
    x=top_cuisines.values,
    y=top_cuisines.index,
    palette='magma'
)
plt.title('Top 10 Cocinas más Comunes')
plt.xlabel('Número de Restaurantes')
plt.ylabel('Cuisine')
plt.tight_layout()
plt.show()
Top 10 cocinas más frecuentes:
Count
tibetan 964
greek 778
thai 549
japanese 475
british 392
turkish 372
irish 358
welsh 345
algerian 305
belgian 280
No description has been provided for this image
In [24]:
cuisine_cols = cuisine_dummies.columns.tolist()
mean_turnover_by_cuisine = {
    cuisine: df_c.loc[df_c[cuisine] == 1, 'Annual Turnover'].mean()
    for cuisine in cuisine_cols
}
mean_turnover_by_cuisine = pd.Series(mean_turnover_by_cuisine).sort_values(ascending=False)

top10_avg = mean_turnover_by_cuisine.head(25)
print("Top cocinas por facturación anual promedio:")
display(top10_avg.to_frame(name='Avg Annual Turnover'))

plt.figure(figsize=(8, 6))
sns.barplot(
    x=top10_avg.values,
    y=top10_avg.index,
    palette='viridis'
)
plt.title('Top Cocinas según Facturación Anual Promedio')
plt.xlabel('Facturación Anual Promedio')
plt.ylabel('Tipo de Cocina')
plt.gca().xaxis.set_major_formatter(FuncFormatter(fmt_millions))
plt.tight_layout()
plt.show()
Top cocinas por facturación anual promedio:
Avg Annual Turnover
hawaiian 3.967391e+07
latvian 3.967391e+07
nigerian 3.408635e+07
tibetan 3.332158e+07
greek 3.329949e+07
jewish 3.276838e+07
polish 3.276838e+07
japanese 3.253053e+07
indian 3.215934e+07
turkish 3.160484e+07
italian 3.133813e+07
british 3.003189e+07
irish 2.994832e+07
algerian 2.944426e+07
thai 2.912386e+07
korean 2.907831e+07
peruvian 2.784783e+07
belgian 2.746071e+07
swedish 2.714130e+07
cuban 2.713010e+07
cajun 2.700000e+07
sapnish 2.664966e+07
welsh 2.600725e+07
chinese 2.500629e+07
salvadorian 2.500629e+07
No description has been provided for this image

La lista de cocinas de cocinas más comunes muestra las 10 cocinas más comunes, entre ellas las dos primeras son la cocina tibetana y la griega. Sin embargo, cuando venimos a la facturación promedio generada por cada tipo de cocina vemos que cocinas como Hawaiian y Latvian lideran el ranking. Por lo que ofrecer tipos de cocinas un poco más exóticas podría ser diferencial para aumentar la facturación.

In [25]:
data['Registration Date'] = pd.to_datetime(data['Opening Day of Restaurant'])
data['Years_Open'] = (pd.Timestamp.today() - data['Registration Date']).dt.days / 365
In [26]:
sns.scatterplot(data=data, x='Years_Open', y='Annual Turnover', alpha=0.5)
Out[26]:
<Axes: xlabel='Years_Open', ylabel='Annual Turnover'>
No description has been provided for this image
In [27]:
data['Age_Bin'] = pd.cut(data['Years_Open'], bins=range(0, 35, 10), right=False)
plt.figure(figsize=(8,5))
sns.boxplot(
    data=data,
    x='Age_Bin',
    y='Annual Turnover',
    color='slateblue'
)
plt.yscale('log')
plt.gca().yaxis.set_major_formatter(FuncFormatter(lambda x, _: f"{int(x/1e6)}M"))
plt.xticks(rotation=45)
plt.xlabel('Rango de años abiertos')
plt.ylabel('Annual Turnover')
plt.title('Turnover por rango de antigüedad (Years Open)')
plt.tight_layout()
plt.show()
No description has been provided for this image

Del boxplot anterior podemos visualizar que los restaurantes con más de 20 años de antiguedad pueden generar un mayor retorno, sin emabgo no es muy alta. También podemos ver que hay almacénes jóvenes que generan mucho dinero, por lo que la antiguedad no asegura que la facturación aumente.

In [28]:
plt.figure(figsize=(8, 6))

data.boxplot(
    column='Years_Open',
    by='Resturant Tier',
    grid=True,
    showfliers=True,
    patch_artist=True
)

plt.title('Distribución de Años de Funcionamiento por Restaurant Tier')
plt.suptitle('')
plt.xlabel('Restaurant Tier')
plt.ylabel('Years Open')

plt.tight_layout()
plt.show()
<Figure size 800x600 with 0 Axes>
No description has been provided for this image

Las medianas son muy similares, sin embargo, el Tier 2.0 muestra una mayor dispersión, mientras que el Tier 1.0 es mucho más compacto en su rango intercuartílico. Lo que nos podría indicar que los restaurantes necesitan haber pasado una madurez mínima, mientras que un restaurante tier 2.0 puede ser un restaurante nuevo o un restaurante viejo.

In [29]:
df_dic = data.select_dtypes(include=['int64', 'float64']).loc[:, lambda x: x.nunique() < 3]

datos_apilados = []
for col in df_dic.columns:
    conteos = df_dic[col].value_counts().sort_index()
    datos_apilados.append({
        'Variable': col,
        'Valor_0': conteos.get(0, 0),
        'Valor_1': conteos.get(1, 0),
        'Valor_2': conteos.get(2, 0)

    })

plot_data = pd.DataFrame(datos_apilados)
fig, ax = plt.subplots(figsize=(12, 6))
x_pos = range(len(plot_data))

bars1 = ax.bar(x_pos, plot_data['Valor_0'], label='Valor 0', color='lightcoral', alpha=0.8)
bars2 = ax.bar(x_pos, plot_data['Valor_1'], bottom=plot_data['Valor_0'],
                label='Valor 1', color='lightblue', alpha=0.8)
bars3 = ax.bar(x_pos, plot_data['Valor_2'],
                bottom=plot_data['Valor_0'] + plot_data['Valor_1'],
                label='Valor 2', color='lightgreen', alpha=0.8)

ax.set_ylabel('Cantidad de Observaciones')

ax.set_title('Distribución de Variables Dicotómicas (0, 1, 2)')

ax.set_xticks(x_pos)

ax.set_xticklabels(plot_data['Variable'], rotation=45, ha='right')

ax.legend()

ax.grid(True, alpha=0.3, axis='y')
No description has been provided for this image
In [30]:
plt.figure(figsize=(8,5))
sns.boxplot(
    data=data,
    x='Liquor License Obtained',
    y='Annual Turnover',
    color='slateblue'
)
plt.yscale('log')
plt.gca().yaxis.set_major_formatter(FuncFormatter(lambda x, _: f"{int(x/1e6)}M"))
plt.xticks(rotation=45)
plt.xlabel('Lincencia de licor')
plt.ylabel('Annual Turnover')
plt.title('Turnover por licencia de años')
plt.tight_layout()
plt.show()
No description has been provided for this image

La mediana de los restaurantes que tienen permiso para vender licor es mayor que la de los restaurantes que no tienen permiso. Esto tiene todo el sentido debido a que el licor puede ser bastante costoso cuando uno va a consumir alguna comida.

In [31]:
plt.figure(figsize=(8,5))
sns.boxplot(
    data=data,
    x='Restaurant Type',
    y='Annual Turnover',
    color='slateblue'
)
plt.yscale('log')
plt.gca().yaxis.set_major_formatter(FuncFormatter(lambda x, _: f"{int(x/1e6)}M"))
plt.xticks(rotation=45)
plt.xlabel('Tipo de restaurantes')
plt.ylabel('Annual Turnover')
plt.title('Turnover por tipo de restaurante')
plt.tight_layout()
plt.show()
No description has been provided for this image
  • Los Gastro Bar tienen la mediana más alta: la facturación anual supera la de los caffee y la de los bares
  • Las Coffee shops muestran la mediana más baja, lo que indica que, en general, facturan menos que los demás formatos.
  • Los Bar y los Gastro Bar presentan un rango intercuartílico (IQR) más amplio, señal de mayor variabilidad entre locales de esos tipos.
  • Las Coffee tienen un IQR más estrecho lo que quiere decir que su facturación es más homogénea
  • La categoría bar es la que más outliers tiene lo que quiere decir que hay establecimientos con facturaciones anuales muy altas, es decir que pueden destacar mucho o también pueden tener facturaciones muy bajas.
In [32]:
import pandas as pd
import matplotlib.pyplot as plt

means = (
    data
    .groupby('Endorsed By')['Annual Turnover']
    .mean()
    .reindex(['Not Specific','Tier A Celebrity','Local Celebrity'])
)

fig, ax = plt.subplots(figsize=(6,5))

colors = ['#4da6ff', '#ff8080', '#80ff80']

bars = ax.bar(
    means.index,
    means.values,
    color=colors,
    edgecolor='none'
)

for bar in bars:
    height = bar.get_height()
    ax.text(
        bar.get_x() + bar.get_width()/2, 
        height + height*0.01,             
        f"{height:.0f}",                   
        ha='center',
        va='bottom',
        fontsize=10
    )

ax.set_title('Average Turnover by Endorsement Type')
ax.set_ylabel('Mean Annual Turnover')
ax.set_ylim(0, means.max() * 1.1) 
ax.grid(axis='y', linestyle='--', alpha=0.5)

plt.tight_layout()
plt.show()
No description has been provided for this image

Conclusiones¶

  • Es claro que los restaurantes endorzados tienen una mayor facturación anual, los restaurantes endorzados por una celebridad tienen una media de 31805000, una media mayor a los resturantes que no tienen un endorzamiento específico. Sin embargo, los restaurantes por celebridades locales, tienen la media más alta demostrando que los restaurantes endorzados por celebridades locales tiene una media de 38390625 millones de facturación anual. Esto puede evidenciar que las personas son más influenciables o siguen más a celebridades locales que otro tipo de celebridades.
  • Los restaurantes con licencia para vender licor tienen una mediana de alrededor de 30 millones, mucho más alta que los establecimientos que no tienen licencia para vender licor que tiene una mediana de alrededor 20 millones. Lo cual tiene mucho sentido debido a que si verificamos la facturación del gastro bar, son las mas altas de los tipos de restaurantes, esto debido a la combinación de vender comida con licor.
  • La cantiadad de años abierto no garantiza el éxito financiero o una gran facturación anual, hay resturantes muy jóvenes o que llevan muy poco en el mercado y tienen facturaciones mayores a los que llevan más años. Por otro lado, podemos identificar dos tiers de restaurantes Tier 1 que son como lso más luxury y el Tier 2, cuando sacamos la media de facturación anual por tier, los resturantes Tier 1 venden alrededor de 45 millones al año mientras que los tier 2 venden alrededor de 30 millones. Ahora haciendo la relación de los años abierto con los tiers podemos ver que los valores de los restaurantes tier 1 con restpecto a los años en el negocio son más ajustados que los de Tier 2 y tienen menos valores extremos, loq ue nos puede indicar que para llegar a ser un restaurante tier 1 los restaurantes deben de tener ciertos años de madurez.
  • Las transmisiones de deportes y la música en vivo estan asociadas a valores de facturaciones anuales altas, esto tiene sentido pues ofrecer cierto tipo de entretenimiento permite llamar mayor clientela lo que se traduce como mayor facturación anual.
  • Ofrecer promociones tiene sus beneficios hasta cierto punto, en el boxplot del comienzo podemos ver que la facturación anual va aumentando del rango de 1 hasta alcanzar su máximo valor en 4, pero de ahi en adelante el valor de la facturación anual va disminuyendo, por lo que sobre optimizar o ofrecer ofertas demasiado buenas o optimas no es sinónimo de una facturación anual mayor.
  • La limpieza o el higiene del restaurante representa una de las características fundamentales para tener una facturación anual alta, en el boxplot podemos ver que entre mejor rating the hiege el restaurante puede obtener facturaciones anuales mayores. Con una puntuación de 9 en higiene, se puede tener una mediana de aprox 55 millones en facturación anual, la mediana más alta de todos los puntajes. De la misma manera la buena respuesta del personal también representa la mejora en la facturación anual, alcanzando su tope con un rango de 8 se puede ver que se alcanza la mediana más alta en la facturación anual.
  • La presencia en redes sociales esta asociada a un aumento en la facturación anual, en la gráfica del scatter plot podemos ver que entre mejor cociente en redes sociales la recta va creciendo, lo que indica que la facturación anual será mejor.

Limpieza de datos¶

Las siguientes variables mostraron un alto porcentaje de correlación con nuestra variable objetiva pero tienen muchos nulos, así que en lugar de eliminarlos se decide imputar los valores para completar los nulos por medio de la mediana. Se escoge la mediana debido a que tiene baja suceptibilidad al ruido.

In [33]:
data_def = data.drop(columns=['Live Sports Rating','Value Deals Rating','Comedy Gigs Rating','Live Music Rating'])

Evaluación de interacciones¶

El modelo final se llevará acabo de la sigueinte manera, se va a utilizar OLS para la evaluación de interacciones entre variables numéricas, se utiliza ANOVA para la evaluación de las mismas entre variables numéricas y categóricas y finalmente se utilizará chi cuadrado para la evaluación de las interacciones entre variables categóricas. Si el p value es menor a 0.05 es poruqe hay una significancia estadística entre las variables. La lógica continua de esta manera, evalauremos la relaciónde la variable dependiente (Annual takeover) con las demás variables (ex: Live Sport Rating y Live Music Rating) si el p value de cada evaluación es < 0.05 es una relación significativa que aporta a la facturación anual y luego evaluaremos la relación entre las variables estudiadas si su p value es < 0.05 hay una interacción clara entre ellas. (Live Sport Rating Live Music Rating)

Se procede a evaluar el impacto de cada variable por medio de una regresión simple que al final la evaluamos con anova y nos permite saber el impacto de cada feature sobre la varible dependiente Facturación anual. Si el p value < 0.05 esa relación entre variables no es estadíticamente significativa.

In [34]:
from scipy.stats import shapiro

stat, p_value = shapiro(data['Annual Turnover'])

print(f"Shapiro-Wilk statistic: {stat:.4f}")
print(f"p-value: {p_value:.4f}")
Shapiro-Wilk statistic: 0.6232
p-value: 0.0000

Se utiliza la prueba de Shapiro-Wilk para probar la linealidad de Annual Turnover, el p value es 0.0 menor a 0.05 por lo tanto se demuestra que la distribución de esta variable es no lineal y se procede hacer una transformación logarítmica.

In [35]:
data_def['Annual Turnover Log'] = np.log1p(data_def['Annual Turnover'])
In [36]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
data_def_test = data_def.rename(columns={
    "Annual Turnover Log": "Annual_Turnover_Log",
    "Live Sports Rating_missing": "Live_Sports_Rating_missing"
})
model = smf.ols(
    "Annual_Turnover_Log ~ Live_Sports_Rating_missing",
    data=data_def_test
).fit()
print(model.summary())
                             OLS Regression Results                            
===============================================================================
Dep. Variable:     Annual_Turnover_Log   R-squared:                       0.004
Model:                             OLS   Adj. R-squared:                  0.004
Method:                  Least Squares   F-statistic:                     14.07
Date:                 Sat, 02 Aug 2025   Prob (F-statistic):           0.000179
Time:                         15:51:11   Log-Likelihood:                -2842.1
No. Observations:                 3493   AIC:                             5688.
Df Residuals:                     3491   BIC:                             5701.
Df Model:                            1                                         
Covariance Type:             nonrobust                                         
==============================================================================================
                                 coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------------
Intercept                     16.5119      0.154    107.352      0.000      16.210      16.814
Live_Sports_Rating_missing     0.1449      0.039      3.751      0.000       0.069       0.221
==============================================================================
Omnibus:                       78.289   Durbin-Watson:                   1.981
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              164.973
Skew:                          -0.085   Prob(JB):                     1.50e-36
Kurtosis:                       4.051   Cond. No.                         70.5
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

El p value entre Annual turn over y Live sport Rating es 0.0 por lo tanto haya una significancia estadística con esta variable, se utiliza Mínimos Cuadrados Ordinarios para comprobarlo.

In [37]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
data_def_test = data_def.rename(columns={
    "Annual Turnover Log": "Annual_Turnover_Log",
    "Value Deals Rating_missing": "Value_Deals_Rating_missing"
})
model = smf.ols(
    "Annual_Turnover_Log ~ Value_Deals_Rating_missing",
    data=data_def_test
).fit()
print(model.summary())
                             OLS Regression Results                            
===============================================================================
Dep. Variable:     Annual_Turnover_Log   R-squared:                       0.012
Model:                             OLS   Adj. R-squared:                  0.011
Method:                  Least Squares   F-statistic:                     40.82
Date:                 Sat, 02 Aug 2025   Prob (F-statistic):           1.89e-10
Time:                         11:35:01   Log-Likelihood:                -2828.8
No. Observations:                 3493   AIC:                             5662.
Df Residuals:                     3491   BIC:                             5674.
Df Model:                            1                                         
Covariance Type:             nonrobust                                         
==============================================================================================
                                 coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------------
Intercept                     16.6106      0.075    220.696      0.000      16.463      16.758
Value_Deals_Rating_missing     0.1217      0.019      6.389      0.000       0.084       0.159
==============================================================================
Omnibus:                       81.265   Durbin-Watson:                   1.988
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              172.336
Skew:                          -0.095   Prob(JB):                     3.78e-38
Kurtosis:                       4.072   Cond. No.                         34.4
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

El p value entre Annual turn over y Value_Deals_Rating es 0.0 por lo tanto haya una significancia estadística con esta variable, se utiliza Mínimos Cuadrados Ordinarios para comprobarlo.

In [39]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
data_def_test = data_def.rename(columns={
    "Annual Turnover Log": "Annual_Turnover_Log",
    "Live Music Rating_missing": "Live_Music_Rating_missing"
})
model = smf.ols(
    "Annual_Turnover_Log ~ Live_Music_Rating_missing",
    data=data_def_test
).fit()
print(model.summary())
                             OLS Regression Results                            
===============================================================================
Dep. Variable:     Annual_Turnover_Log   R-squared:                       0.048
Model:                             OLS   Adj. R-squared:                  0.047
Method:                  Least Squares   F-statistic:                     174.4
Date:                 Sat, 02 Aug 2025   Prob (F-statistic):           6.75e-39
Time:                         11:35:37   Log-Likelihood:                -2764.0
No. Observations:                 3493   AIC:                             5532.
Df Residuals:                     3491   BIC:                             5544.
Df Model:                            1                                         
Covariance Type:             nonrobust                                         
=============================================================================================
                                coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------------
Intercept                    16.5512      0.042    397.579      0.000      16.470      16.633
Live_Music_Rating_missing     0.1338      0.010     13.207      0.000       0.114       0.154
==============================================================================
Omnibus:                      110.629   Durbin-Watson:                   1.991
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              287.144
Skew:                          -0.071   Prob(JB):                     4.44e-63
Kurtosis:                       4.397   Cond. No.                         20.0
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

El p value entre Annual turn over y Live_Music_Rating es 0.0 por lo tanto haya una significancia estadística con esta variable, se utiliza Mínimos Cuadrados Ordinarios para comprobarlo.

In [36]:
data_def = data_def.drop(columns=['Comedy Gigs Rating_missing'])

Se eliminan las siguientes varibles debido a que ya estan representadas por el por age_restaurant que es una variable que representa el tiempo de vida del restaurante.

In [37]:
data_def = data_def.drop(columns=['Opening Day of Restaurant','Registration Date','Age_Bin'])
In [38]:
data_def = data_def.drop(columns=['Registration Number'])

Para verificar como influyen algunas variables en nuestra variable objetiva Annual Turnover usamos ANOVA, la cual nos dicata que si el p-value es menor a 0.005 la variable tiene un gran impacto sobre nuestra variable Y o de salida. Esto en caso de de que se quiera explorar la relación de la una variable categórica con un nuestra variable Y que es continua.

In [43]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
data_def_test = data_def.rename(columns={
    "Annual Turnover Log": "Annual_Turnover_Log",
    "Hygiene Rating": "Hygiene_Rating"
})
model = smf.ols(
    "Annual_Turnover_Log ~ Hygiene_Rating",
    data=data_def_test
).fit()
print(model.summary())
                             OLS Regression Results                            
===============================================================================
Dep. Variable:     Annual_Turnover_Log   R-squared:                       0.110
Model:                             OLS   Adj. R-squared:                  0.109
Method:                  Least Squares   F-statistic:                     430.0
Date:                 Sat, 02 Aug 2025   Prob (F-statistic):           3.63e-90
Time:                         11:36:56   Log-Likelihood:                -2646.3
No. Observations:                 3493   AIC:                             5297.
Df Residuals:                     3491   BIC:                             5309.
Df Model:                            1                                         
Covariance Type:             nonrobust                                         
==================================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
Intercept         16.4173      0.033    490.096      0.000      16.352      16.483
Hygiene_Rating     0.1438      0.007     20.736      0.000       0.130       0.157
==============================================================================
Omnibus:                      146.481   Durbin-Watson:                   1.983
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              468.318
Skew:                           0.042   Prob(JB):                    2.02e-102
Kurtosis:                       4.792   Cond. No.                         19.3
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

El p value entre Annual turn over y Hygiene_Rating es 0.0 por lo tanto haya una significancia estadística con esta variable, se utiliza Mínimos Cuadrados Ordinarios para comprobarlo.

In [44]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
data_def_test = data_def.rename(columns={
    "Annual Turnover Log": "Annual_Turnover_Log",
    "Staff Responsivness": "Staff_Responsivness"
})
model = smf.ols(
    "Annual_Turnover_Log ~ Staff_Responsivness",
    data=data_def_test
).fit()
print(model.summary())
                             OLS Regression Results                            
===============================================================================
Dep. Variable:     Annual_Turnover_Log   R-squared:                       0.055
Model:                             OLS   Adj. R-squared:                  0.055
Method:                  Least Squares   F-statistic:                     204.5
Date:                 Sat, 02 Aug 2025   Prob (F-statistic):           4.04e-45
Time:                         11:37:39   Log-Likelihood:                -2749.7
No. Observations:                 3493   AIC:                             5503.
Df Residuals:                     3491   BIC:                             5516.
Df Model:                            1                                         
Covariance Type:             nonrobust                                         
=======================================================================================
                          coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------
Intercept              16.5539      0.038    430.973      0.000      16.479      16.629
Staff_Responsivness     0.1177      0.008     14.300      0.000       0.102       0.134
==============================================================================
Omnibus:                       94.066   Durbin-Watson:                   1.987
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              229.444
Skew:                          -0.020   Prob(JB):                     1.50e-50
Kurtosis:                       4.255   Cond. No.                         20.8
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

El p value entre Annual turn over y Staff_Responsivness_Rating es 0.0 por lo tanto haya una significancia estadística con esta variable, se utiliza Mínimos Cuadrados Ordinarios para comprobarlo.

In [45]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
data_def_test = data_def.rename(columns={
    "Annual Turnover Log": "Annual_Turnover_Log",
    "Order Wait Time": "Order_Wait_Time"
})
model = smf.ols(
    "Annual_Turnover_Log ~ Order_Wait_Time",
    data=data_def_test
).fit()
print(model.summary())
                             OLS Regression Results                            
===============================================================================
Dep. Variable:     Annual_Turnover_Log   R-squared:                       0.000
Model:                             OLS   Adj. R-squared:                 -0.000
Method:                  Least Squares   F-statistic:                    0.3292
Date:                 Sat, 02 Aug 2025   Prob (F-statistic):              0.566
Time:                         11:39:19   Log-Likelihood:                -2849.0
No. Observations:                 3493   AIC:                             5702.
Df Residuals:                     3491   BIC:                             5714.
Df Model:                            1                                         
Covariance Type:             nonrobust                                         
===================================================================================
                      coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------
Intercept          17.0776      0.020    848.487      0.000      17.038      17.117
Order_Wait_Time     0.0019      0.003      0.574      0.566      -0.004       0.008
==============================================================================
Omnibus:                       76.054   Durbin-Watson:                   1.982
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              155.723
Skew:                          -0.095   Prob(JB):                     1.53e-34
Kurtosis:                       4.017   Cond. No.                         13.8
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

El p value entre Annual turn over y Wait for order es 0.566 por lo tanto no hay una significancia estadística con esta variable, se utiliza Mínimos Cuadrados Ordinarios para comprobarlo.

In [49]:
from scipy.stats import f_oneway

groups = [group["Annual Turnover Log"].values for _, group in data_def.groupby("Restaurant Type")]
f_val, p_val = f_oneway(*groups)

print(f"F-statistic: {f_val:.2f}")
print(f"P-value: {p_val:.4f}")
F-statistic: 8.97
P-value: 0.0000

Hay una relevancia estadística entre la facturación anual y el tipo de restaurante se utiliza ANOVA para comprobarlo y tiene un p value de 0.00 < 0.05

In [58]:
groups = [group["Annual Turnover Log"].values for _, group in data_def.groupby("Cuisine")]
f_val, p_val = f_oneway(*groups)

print(f"F-statistic: {f_val:.2f}")
print(f"P-value: {p_val:.4f}")
F-statistic: 13.45
P-value: 0.0000

Hay una relevancia estadística entre la facturación anual y el tipo de restaurante se utiliza ANOVA para comprobarlo y tiene un p value de 0.00 < 0.05

In [63]:
groups = [group["Annual Turnover Log"].values for _, group in data_def.groupby("Restaurant Theme")]
f_val, p_val = f_oneway(*groups)

print(f"F-statistic: {f_val:.2f}")
print(f"P-value: {p_val:.4f}")
F-statistic: 1.13
P-value: 0.2742

No hay una relevancia estadística entre Restaurant Theme y la facturación anual, el p value es 0.27 mayor a 0.05, se utiliza ANOVA para demostrarlo.

In [39]:
data_def['City'] = data_def['City'].replace('-1', 'Unknown')
freqs = data_def['City'].value_counts()

ciudades_freq = freqs[freqs >= 20].index

data_def['City_Reducida'] = data_def['City'].where(data_def['City'].isin(ciudades_freq), other='Other')

City tiene una dimensionalidad muy alta, pero hay ciudade que tienen una frecuencia de 1 por lo cual se procede a agruparlas en otros grupos.

In [65]:
groups = [group["Annual Turnover Log"].values for _, group in data_def.groupby("City")]
f_val, p_val = f_oneway(*groups)

print(f"F-statistic: {f_val:.2f}")
print(f"P-value: {p_val:.4f}")
F-statistic: 2.32
P-value: 0.0000

Se utiliza anova para evaluar la relevancia estadística de City en la facturación anual, con unp value de 0.0 es mayor a 0.05, por loq ue hay relevancia estadística

In [66]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
data_def_test = data_def.rename(columns={
    "Annual Turnover Log": "Annual_Turnover_Log",
    "Hygiene Rating": "Hygiene_Rating",
    "Live Music Rating_missing": "Live_Music_Rating_missing",
    "Value Deals Rating_missing": "Value_Deals_Rating_missing",
    "Staff Responsivness": "Staff_Responsivness",
    "Cuisine": "Cuisine",
    "City":"City"
})
model = smf.ols(
    "Annual_Turnover_Log ~ C(City)",
    data=data_def_test
).fit()
anova_tbl = sm.stats.anova_lm(model, typ=2)
print(anova_tbl)
              sum_sq      df         F        PR(>F)
C(City)   185.148122   296.0  2.324339  9.825746e-29
Residual  860.072071  3196.0       NaN           NaN
In [50]:
groups = [group["Annual Turnover Log"].values for _, group in data_def.groupby("Restaurant Location")]
f_val, p_val = f_oneway(*groups)

print(f"F-statistic: {f_val:.2f}")
print(f"P-value: {p_val:.4f}")
F-statistic: 2.49
P-value: 0.1145

Se utiliza anova para demostrar que la locacción del restaurante no tiene una relevancia estadística alta, con un p value de 0.11 es mayor a 0.05

In [51]:
groups = [group["Annual Turnover Log"].values for _, group in data_def.groupby("Endorsed By")]
f_val, p_val = f_oneway(*groups)

print(f"F-statistic: {f_val:.2f}")
print(f"P-value: {p_val:.4f}")
F-statistic: 22.17
P-value: 0.0000

Se Utiliza ANOVA para demostrar la relevancia de el endorzamiento par ala facturación anual, con un p value de 0.0 < 0.05 es muy relevante

In [52]:
groups = [group["Annual Turnover Log"].values for _, group in data_def.groupby("Resturant Tier")]
f_val, p_val = f_oneway(*groups)

print(f"F-statistic: {f_val:.2f}")
print(f"P-value: {p_val:.4f}")
F-statistic: 146.71
P-value: 0.0000

Se Utiliza ANOVA para demostrar la relevancia de el tier para la facturación anual, con un p value de 0.0 < 0.05 es muy relevante

In [58]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
data_def_test = data_def.rename(columns={
    "Annual Turnover Log": "Annual_Turnover_Log",
    "Years_Open": "Years_Open"
})
model = smf.ols(
    "Annual_Turnover_Log ~ Years_Open",
    data=data_def_test
).fit()
print(model.summary())
                             OLS Regression Results                            
===============================================================================
Dep. Variable:     Annual_Turnover_Log   R-squared:                       0.014
Model:                             OLS   Adj. R-squared:                  0.014
Method:                  Least Squares   F-statistic:                     50.92
Date:                 Sat, 02 Aug 2025   Prob (F-statistic):           1.17e-12
Time:                         07:34:22   Log-Likelihood:                -2823.9
No. Observations:                 3493   AIC:                             5652.
Df Residuals:                     3491   BIC:                             5664.
Df Model:                            1                                         
Covariance Type:             nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     16.5016      0.083    199.583      0.000      16.339      16.664
Years_Open     0.0374      0.005      7.136      0.000       0.027       0.048
==============================================================================
Omnibus:                       84.790   Durbin-Watson:                   1.985
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              157.987
Skew:                          -0.173   Prob(JB):                     4.94e-35
Kurtosis:                       3.983   Cond. No.                         143.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Se utiliza Mínimos Cuadrados Ordinarios para evaluar la relevancia de los años del restaurante con un p value de 0.027 < 0.05 es muy relevante.

In [57]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
data_def_test = data_def.rename(columns={
    "Annual Turnover Log": "Annual_Turnover_Log",
    "Facebook Popularity Quotient": "Facebook_Popularity_Quotient"
})
model = smf.ols(
    "Annual_Turnover_Log ~ Facebook_Popularity_Quotient",
    data=data_def_test
).fit()
print(model.summary())
                             OLS Regression Results                            
===============================================================================
Dep. Variable:     Annual_Turnover_Log   R-squared:                       0.070
Model:                             OLS   Adj. R-squared:                  0.069
Method:                  Least Squares   F-statistic:                     253.8
Date:                 Sat, 02 Aug 2025   Prob (F-statistic):           3.65e-55
Time:                         07:34:13   Log-Likelihood:                -2640.0
No. Observations:                 3394   AIC:                             5284.
Df Residuals:                     3392   BIC:                             5296.
Df Model:                            1                                         
Covariance Type:             nonrobust                                         
================================================================================================
                                   coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------------------
Intercept                       15.9433      0.072    220.551      0.000      15.802      16.085
Facebook_Popularity_Quotient     0.0147      0.001     15.932      0.000       0.013       0.016
==============================================================================
Omnibus:                      124.048   Durbin-Watson:                   1.989
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              363.676
Skew:                          -0.005   Prob(JB):                     1.07e-79
Kurtosis:                       4.604   Cond. No.                         628.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Se utiliza Mínimos Cuadrados Ordinarios para demostrar la relevancia de La populatidad en facebook para la facturación anual, con un p value de 0.0 < 0.05 es muy relevante.

In [56]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
data_def_test = data_def.rename(columns={
    "Annual Turnover Log": "Annual_Turnover_Log",
    "Instagram Popularity Quotient": "Instagram_Popularity_Quotient"
})
model = smf.ols(
    "Annual_Turnover_Log ~ Instagram_Popularity_Quotient",
    data=data_def_test
).fit()
print(model.summary())
                             OLS Regression Results                            
===============================================================================
Dep. Variable:     Annual_Turnover_Log   R-squared:                       0.061
Model:                             OLS   Adj. R-squared:                  0.060
Method:                  Least Squares   F-statistic:                     221.4
Date:                 Sat, 02 Aug 2025   Prob (F-statistic):           1.38e-48
Time:                         07:34:03   Log-Likelihood:                -2701.1
No. Observations:                 3437   AIC:                             5406.
Df Residuals:                     3435   BIC:                             5418.
Df Model:                            1                                         
Covariance Type:             nonrobust                                         
=================================================================================================
                                    coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------------------
Intercept                        16.1724      0.062    259.654      0.000      16.050      16.295
Instagram_Popularity_Quotient     0.0123      0.001     14.881      0.000       0.011       0.014
==============================================================================
Omnibus:                      116.285   Durbin-Watson:                   2.001
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              319.967
Skew:                          -0.048   Prob(JB):                     3.31e-70
Kurtosis:                       4.492   Cond. No.                         517.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [45]:
import pandas as pd
from scipy.stats import chi2_contingency

contingency = pd.crosstab(data_def['City_Reducida'], data_def['Cuisine'])

chi2, p, dof, expected = chi2_contingency(contingency)

print(f"χ² = {chi2:.2f}, p-value = {p:.4f}, dof = {dof}")
print("Tabla esperada si fueran independientes:\n", expected)
χ² = 700.54, p-value = 0.0000, dof = 380
Tabla esperada si fueran independientes:
 [[ 10.60721443  24.53907816  13.14028056   6.33266533  42.90380762
   25.17234469  12.82364729   7.28256513  43.22044088  13.45691383
   32.29659319   3.7995992   18.20641283  18.52304609  14.56513026
  108.60521042  44.01202405  35.62124248  23.27254509  54.61923848]
 [  0.38362439   0.88748926   0.47523619   0.22902949   1.55167478
    0.91039221   0.46378471   0.26338391   1.56312625   0.48668766
    1.16805039   0.13741769   0.65845978   0.66991125   0.52676782
    3.92785571   1.59175494   1.28829087   0.84168337   1.97537933]
 [  0.40280561   0.93186373   0.498998     0.24048096   1.62925852
    0.95591182   0.48697395   0.27655311   1.64128257   0.51102204
    1.22645291   0.14428858   0.69138277   0.70340681   0.55310621
    4.1242485    1.67134269   1.35270541   0.88376754   2.0741483 ]
 [  4.68021758  10.82736902   5.79788148   2.79415975  18.93043229
   11.106785     5.65817349   3.21328371  19.07014028   5.93758946
   14.25021472   1.67649585   8.03320928   8.17291726   6.42656742
   47.91983968  19.41941025  15.71714858  10.26853707  24.09962783]
 [  0.46034927   1.06498712   0.57028342   0.27483539   1.86200973
    1.09247066   0.55654165   0.31606069   1.8757515    0.58402519
    1.40166046   0.16490123   0.79015173   0.8038935    0.63212139
    4.71342685   1.91010593   1.54594904   1.01002004   2.3704552 ]
 [  3.33753221   7.7211566    4.13455482   1.99255654  13.49957057
    7.92041225   4.034927     2.29144002  13.5991984    4.23418265
   10.16203836   1.19553392   5.72860006   5.82822788   4.58288005
   34.17234469  13.84826796  11.20813055   7.32264529  17.18580017]
 [  5.65845978  13.09046665   7.00973375   3.37818494  22.88720298
   13.42828514   6.84082451   3.88491268  23.05611222   7.178643
   17.2287432    2.02691096   9.71228171   9.88119095   7.76982537
   57.93587174  23.47838534  19.00229029  12.41482966  29.13684512]
 [  0.40280561   0.93186373   0.498998     0.24048096   1.62925852
    0.95591182   0.48697395   0.27655311   1.64128257   0.51102204
    1.22645291   0.14428858   0.69138277   0.70340681   0.55310621
    4.1242485    1.67134269   1.35270541   0.88376754   2.0741483 ]
 [  0.6905239    1.59748068   0.85542514   0.41225308   2.7930146
    1.63870598   0.83481248   0.47409104   2.81362725   0.87603779
    2.1024907    0.24735185   1.1852276    1.20584025   0.94818208
    7.07014028   2.86515889   2.31892356   1.51503006   3.55568279]
 [  1.68794732   3.90495276   2.09103922   1.00772975   6.82736902
    4.00572574   2.04065273   1.15888921   6.87775551   2.14142571
    5.1394217    0.60463785   2.89722302   2.9476095    2.31777841
   17.28256513   7.00372173   5.66847982   3.70340681   8.69166905]
 [  0.53707415   1.24248497   0.66533066   0.32064128   2.17234469
    1.2745491    0.6492986    0.36873747   2.18837675   0.68136273
    1.63527054   0.19238477   0.92184369   0.93787575   0.73747495
    5.498998     2.22845691   1.80360721   1.17835671   2.76553106]
 [  1.72630976   3.99370169   2.13856284   1.03063269   6.9825365
    4.09676496   2.08703121   1.1852276    7.03406814   2.19009447
    5.25622674   0.61837962   2.963069     3.01460063   2.3704552
   17.6753507    7.16289722   5.7973089    3.78757515   8.88920699]
 [  0.61379903   1.41998282   0.7603779    0.36644718   2.48267965
    1.45662754   0.74205554   0.42141426   2.501002     0.77870026
    1.86888062   0.21986831   1.05353564   1.071858     0.84282851
    6.28456914   2.5468079    2.06126539   1.34669339   3.16060693]
 [  0.42198683   0.97623819   0.52275981   0.25193244   1.70684226
    1.00143143   0.51016318   0.2897223    1.71943888   0.53535643
    1.28485543   0.15115946   0.72430575   0.73690238   0.5794446
    4.32064128   1.75093043   1.41711995   0.9258517    2.17291726]
 [  3.37589465   7.80990553   4.18207844   2.01545949  13.65473805
    8.01145147   4.08130547   2.31777841  13.75551102   4.28285142
   10.2788434    1.20927569   5.79444603   5.89521901   4.63555683
   34.56513026  14.00744346  11.33695963   7.40681363  17.3833381 ]
 [  6.21471514  14.37732608   7.69882622   3.7102777   25.13713141
   14.74835385   7.51331234   4.26681935  25.32264529   7.88434011
   18.92241626   2.22616662  10.66704838  10.85256227   8.53363871
   63.63126253  25.78643     20.87031205  13.63527054  32.00114515]
 [ 12.21843687  28.26653307  15.13627255   7.29458918  49.42084168
   28.99599198  14.77154309   8.38877756  49.78557114  15.501002
   37.20240481   4.37675351  20.97194389  21.33667335  16.77755511
  125.10220441  50.69739479  41.03206413  26.80761523  62.91583166]
 [  5.02547953  11.62610936   6.22559405   3.00028629  20.32693959
   11.92613799   6.07557973   3.45032923  20.47695391   6.37560836
   15.30146006   1.80017177   8.62582307   8.77583739   6.90065846
   51.45490982  20.85198969  16.87661036  11.0260521   25.87746922]
 [  7.59576295  17.57228743   9.4096765    4.53478385  30.72316061
   18.02576582   9.1829373    5.21500143  30.9498998    9.63641569
   23.12739765   2.72087031  13.03750358  13.26424277  10.43000286
   77.77154309  31.51674778  25.50815918  16.66533066  39.11251074]
 [  0.51789293   1.19811051   0.64156885   0.30918981   2.09476095
    1.22902949   0.62610936   0.35556828   2.11022044   0.65702834
    1.57686802   0.18551388   0.8889207    0.90438019   0.71113656
    5.30260521   2.14886917   1.73919267   1.13627255   2.6667621 ]
 [  0.44116805   1.02061265   0.54652161   0.26338391   1.78442599
    1.04695104   0.53335242   0.3028915    1.79759519   0.55969081
    1.34325794   0.15803035   0.75722874   0.77039794   0.60578299
    4.51703407   1.83051818   1.4815345    0.96793587   2.27168623]]

Se utiliza chi cuadrado para demostrar la interacción de city con cuisine, con un p value de 0.0 esta interacción aporta al modelo para la facturación anual. p value < 0.05

In [67]:
import pandas as pd
from scipy.stats import chi2_contingency

contingency = pd.crosstab(data_def['Endorsed By'], data_def['Resturant Tier'])

chi2, p, dof, expected = chi2_contingency(contingency)

print(f"χ² = {chi2:.2f}, p-value = {p:.4f}, dof = {dof}")
print("Tabla esperada si fueran independientes:\n", expected)
χ² = 106.48, p-value = 0.0000, dof = 2
Tabla esperada si fueran independientes:
 [[   2.27729384   28.72270616]
 [ 142.36759582 1795.63240418]
 [ 108.35511034 1366.64488966]]

Se utiliza chi cuadrado para demostrar la interacción de endorsed by con restaurant tier, con un p value de 0.0 esta interacción aporta al modelo para la facturación anual. p value < 0.05

In [55]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
data_def_test = data_def.rename(columns={
    "Value Deals Rating_missing": "Value_Deals_Rating_missing",
    "Instagram Popularity Quotient": "Instagram_Popularity_Quotient"
})
model = smf.ols(
    "Value_Deals_Rating_missing ~ Instagram_Popularity_Quotient",
    data=data_def_test
).fit()
print(model.summary())
                                OLS Regression Results                                
======================================================================================
Dep. Variable:     Value_Deals_Rating_missing   R-squared:                       0.004
Model:                                    OLS   Adj. R-squared:                  0.003
Method:                         Least Squares   F-statistic:                     12.27
Date:                        Sat, 02 Aug 2025   Prob (F-statistic):           0.000466
Time:                                07:33:53   Log-Likelihood:                -1870.4
No. Observations:                        3437   AIC:                             3745.
Df Residuals:                            3435   BIC:                             3757.
Df Model:                                   1                                         
Covariance Type:                    nonrobust                                         
=================================================================================================
                                    coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------------------
Intercept                         0.6053      0.049     12.375      0.000       0.509       0.701
Instagram_Popularity_Quotient     0.0023      0.001      3.503      0.000       0.001       0.004
==============================================================================
Omnibus:                      614.092   Durbin-Watson:                   2.010
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              990.442
Skew:                          -1.308   Prob(JB):                    8.48e-216
Kurtosis:                       2.731   Cond. No.                         517.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Se utiliza Mínimos Cuadrados Ordinarios para demostrar la interacción entre la popularidad en isntagram con las promociones con un p value de 0.0 < 0.05 es estadisticamente significante.

In [53]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
data_def_test = data_def.rename(columns={
    "Annual Turnover Log": "Annual_Turnover_Log",
    "Liquor License Obtained": "Liquour_License_Obtained"
})
model = smf.ols(
    "Annual_Turnover_Log ~ Liquour_License_Obtained",
    data=data_def_test
).fit()
print(model.summary())
                             OLS Regression Results                            
===============================================================================
Dep. Variable:     Annual_Turnover_Log   R-squared:                       0.001
Model:                             OLS   Adj. R-squared:                  0.001
Method:                  Least Squares   F-statistic:                     5.016
Date:                 Sat, 02 Aug 2025   Prob (F-statistic):             0.0252
Time:                         07:33:29   Log-Likelihood:                -2846.6
No. Observations:                 3493   AIC:                             5697.
Df Residuals:                     3491   BIC:                             5710.
Df Model:                            1                                         
Covariance Type:             nonrobust                                         
============================================================================================
                               coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------------
Intercept                   16.8978      0.085    197.881      0.000      16.730      17.065
Liquour_License_Obtained     0.1924      0.086      2.240      0.025       0.024       0.361
==============================================================================
Omnibus:                       76.538   Durbin-Watson:                   1.983
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              157.477
Skew:                          -0.094   Prob(JB):                     6.37e-35
Kurtosis:                       4.023   Cond. No.                         18.4
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Se utiliza Mínimos Cuadrados Ordinarios para demostrar la interacción entre la obtención de la licencia de alcohol con la facturación anual con un p value de 0.025 < 0.05 es estadisticamente significante.

In [51]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
data_def_test = data_def.rename(columns={
    "Resturant Tier": "Resturant_Tier",
    "Years_Open": "Years_Open"
})
model = smf.ols(
    "Resturant_Tier ~ Years_Open",
    data=data_def_test
).fit()
print(model.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Resturant_Tier   R-squared:                       0.001
Model:                            OLS   Adj. R-squared:                  0.001
Method:                 Least Squares   F-statistic:                     5.119
Date:                Sat, 02 Aug 2025   Prob (F-statistic):             0.0237
Time:                        07:32:56   Log-Likelihood:                -256.74
No. Observations:                3444   AIC:                             517.5
Df Residuals:                    3442   BIC:                             529.8
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      2.0164      0.040     50.433      0.000       1.938       2.095
Years_Open    -0.0057      0.003     -2.263      0.024      -0.011      -0.001
==============================================================================
Omnibus:                     2251.996   Durbin-Watson:                   2.037
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            16894.238
Skew:                          -3.263   Prob(JB):                         0.00
Kurtosis:                      11.669   Cond. No.                         143.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Se utiliza Mínimos Cuadrados Ordinarios para demostrar la interacción entre la el tier del restaurante con los años del restaurante con un p value de 0.024 < 0.05 es estadisticamente significante.

In [69]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
data_def_test = data_def.rename(columns={
    "Annual Turnover Log": "Annual_Turnover_Log",
    "Hygiene Rating": "Hygiene_Rating",
    "Live Music Rating_missing": "Live_Music_Rating_missing",
    "Value Deals Rating_missing": "Value_Deals_Rating_missing",
    "Staff Responsivness": "Staff_Responsivness",
    "Cuisine": "Cuisine",
    "City":"City",
    "Years_Open":"Years_Open",
    "Restaurant Type":"Resturant_Type"
})
model = smf.ols(
    "Years_Open ~ C(Resturant_Type)",
    data=data_def_test
).fit()
anova_tbl = sm.stats.anova_lm(model, typ=2)
print(anova_tbl)
                        sum_sq      df           F        PR(>F)
C(Resturant_Type)   973.454273     3.0  115.795724  1.746768e-71
Residual           9776.935410  3489.0         NaN           NaN

Se verifica la interacción de los años abierto del resturante con el tipo de restaurante con un p value de 0.0 < 0.05 es una interacción relevante. Se utiliza ANOVA para ello.

In [76]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
data_def_test = data_def.rename(columns={
    "Annual Turnover Log": "Annual_Turnover_Log",
    "Hygiene Rating": "Hygiene_Rating",
    "Live Music Rating_missing": "Live_Music_Rating_missing",
    "Value Deals Rating_missing": "Value_Deals_Rating_missing",
    "Staff Responsivness": "Staff_Responsivness",
    "Cuisine": "Cuisine",
    "City":"City",
    "Years_Open":"Years_Open",
    "Restaurant Type":"Resturant_Type",
    "Liquor License Obtained": "Liquour_License_Obtained"
})
model = smf.ols(
    "Liquour_License_Obtained ~ C(Resturant_Type)",
    data=data_def_test
).fit()
anova_tbl = sm.stats.anova_lm(model, typ=2)
print(anova_tbl)
                      sum_sq      df           F         PR(>F)
C(Resturant_Type)   7.447997     3.0  261.923886  2.632551e-153
Residual           33.070755  3489.0         NaN            NaN

Se verifica la interacción de la licencia de licor con el tipo de restaurante con un p value de 0.0 < 0.05 es una interacción relevante. Se utiliza ANOVA para ello.

In [46]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
data_def_test = data_def.rename(columns={
    "Instagram Popularity Quotient": "Instagram_Popularity_Quotient",
    "Years_Open": "Years_Open"
})
model = smf.ols(
    "Instagram_Popularity_Quotient ~ Years_Open",
    data=data_def_test
).fit()
print(model.summary())
                                  OLS Regression Results                                 
=========================================================================================
Dep. Variable:     Instagram_Popularity_Quotient   R-squared:                       0.071
Model:                                       OLS   Adj. R-squared:                  0.071
Method:                            Least Squares   F-statistic:                     264.2
Date:                           Sat, 02 Aug 2025   Prob (F-statistic):           2.73e-57
Time:                                   11:56:44   Log-Likelihood:                -12972.
No. Observations:                           3437   AIC:                         2.595e+04
Df Residuals:                               3435   BIC:                         2.596e+04
Df Model:                                      1                                         
Covariance Type:                       nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept    100.5023      1.616     62.206      0.000      97.335     103.670
Years_Open    -1.6640      0.102    -16.254      0.000      -1.865      -1.463
==============================================================================
Omnibus:                      129.931   Durbin-Watson:                   1.999
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               57.509
Skew:                           0.010   Prob(JB):                     3.25e-13
Kurtosis:                       2.367   Cond. No.                         142.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [55]:
groups = [group["Value Deals Rating_missing"].values for _, group in data_def.groupby("Endorsed By")]
f_val, p_val = f_oneway(*groups)

print(f"F-statistic: {f_val:.2f}")
print(f"P-value: {p_val:.4f}")
F-statistic: 5.98
P-value: 0.0026
In [46]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
data_def_test = data_def.rename(columns={
    "Annual Turnover Log": "Annual_Turnover_Log",
    "Hygiene Rating": "Hygiene_Rating",
    "Live Music Rating_missing": "Live_Music_Rating_missing",
    "Value Deals Rating_missing": "Value_Deals_Rating_missing",
    "Staff Responsivness": "Staff_Responsivness",
    "Cuisine": "Cuisine",
    "City":"City",
    "Years_Open":"Years_Open",
    "Restaurant Type":"Resturant_Type",
    "Liquor License Obtained": "Liquour_License_Obtained"
})
model = smf.ols(
    "Value_Deals_Rating_missing ~ C(Live_Music_Rating_missing)",
    data=data_def_test
).fit()
anova_tbl = sm.stats.anova_lm(model, typ=2)
print(anova_tbl)
                                  sum_sq      df           F        PR(>F)
C(Live_Music_Rating_missing)   30.117479     1.0  181.584248  2.196666e-40
Residual                      579.015644  3491.0         NaN           NaN
In [51]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
data_def_test = data_def.rename(columns={
    "Annual Turnover Log": "Annual_Turnover_Log",
    "Hygiene Rating": "Hygiene_Rating",
    "Live Music Rating_missing": "Live_Music_Rating_missing",
    "Value Deals Rating_missing": "Value_Deals_Rating_missing",
    "Staff Responsivness": "Staff_Responsivness",
    "Cuisine": "Cuisine",
    "City":"City",
    "Years_Open":"Years_Open",
    "Restaurant Type":"Resturant_Type",
    "Liquor License Obtained": "Liquour_License_Obtained",
    "Resturant Tier":"Resturant_Tier",
    
})
model = smf.ols(
    "Resturant_Tier ~ C(Live_Music_Rating_missing)",
    data=data_def_test
).fit()
anova_tbl = sm.stats.anova_lm(model, typ=2)
print(anova_tbl)
                                  sum_sq      df          F        PR(>F)
C(Live_Music_Rating_missing)    6.976709     7.0  15.057172  1.692741e-19
Residual                      227.437634  3436.0        NaN           NaN
In [40]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
data_def_test = data_def.rename(columns={
    "Annual Turnover Log": "Annual_Turnover_Log",
    "Hygiene Rating": "Hygiene_Rating",
    "Live Music Rating_missing": "Live_Music_Rating_missing",
    "Value Deals Rating_missing": "Value_Deals_Rating_missing",
    "Staff Responsivness": "Staff_Responsivness",
    "Cuisine": "Cuisine",
    "City_Reducida":"City_Reducida",
    "Endorsed By":"Endorsed_By",
    "Resturant Tier":"Resturant_Tier",
    "Instagram Popularity Quotient": "Instagram_Popularity_Quotient",
    "Facebook Popularity Quotient": "Facebook_Popularity_Quotient",
    "Restaurant Type":"Resturant_Type",
    "Liquor License Obtained": "Liquour_License_Obtained",
    "Value for Money":"Value_for_Money"
})
model = smf.ols(
    "Annual_Turnover_Log ~ Hygiene_Rating+City_Reducida*Cuisine+Years_Open*Instagram_Popularity_Quotient+Liquour_License_Obtained*Resturant_Type+Resturant_Tier*Live_Music_Rating_missing+Endorsed_By*Value_Deals_Rating_missing",
    data=data_def_test
).fit()
print(model.summary())
anova_tbl = sm.stats.anova_lm(model, typ=2)
print(anova_tbl)
                             OLS Regression Results                            
===============================================================================
Dep. Variable:     Annual_Turnover_Log   R-squared:                       0.346
Model:                             OLS   Adj. R-squared:                  0.274
Method:                  Least Squares   F-statistic:                     4.827
Date:                 Sat, 02 Aug 2025   Prob (F-statistic):          5.33e-124
Time:                         16:41:08   Log-Likelihood:                -2052.3
No. Observations:                 3389   AIC:                             4775.
Df Residuals:                     3054   BIC:                             6828.
Df Model:                          334                                         
Covariance Type:             nonrobust                                         
=======================================================================================================================================
                                                                          coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------------------------------------------------------
Intercept                                                              12.8381      0.856     14.991      0.000      11.159      14.517
City_Reducida[T.Bhubaneswar]                                           -1.0613      0.488     -2.174      0.030      -2.019      -0.104
City_Reducida[T.Chandigarh]                                            -0.3547      0.488     -0.726      0.468      -1.312       0.603
City_Reducida[T.Chennai]                                               -0.8208      0.360     -2.281      0.023      -1.526      -0.115
City_Reducida[T.Greater Noida]                                         -0.0881      0.112     -0.788      0.431      -0.307       0.131
City_Reducida[T.Gurgaon]                                               -0.3137      0.238     -1.318      0.188      -0.780       0.153
City_Reducida[T.Hyderabad]                                             -0.2597      0.217     -1.195      0.232      -0.686       0.167
City_Reducida[T.Indore]                                                -0.4192      0.111     -3.760      0.000      -0.638      -0.201
City_Reducida[T.Jaipur]                                                -0.6117      0.360     -1.700      0.089      -1.317       0.094
City_Reducida[T.Kolkata]                                               -0.4247      0.080     -5.281      0.000      -0.582      -0.267
City_Reducida[T.Lucknow]                                               -0.8882      0.490     -1.814      0.070      -1.849       0.072
City_Reducida[T.Mumbai]                                                 0.1055      0.080      1.318      0.187      -0.051       0.262
City_Reducida[T.Mysore]                                                -0.1408      0.105     -1.335      0.182      -0.348       0.066
City_Reducida[T.Navi Mumbai]                                           -0.0697      0.128     -0.545      0.586      -0.320       0.181
City_Reducida[T.New Delhi]                                             -0.1939      0.305     -0.635      0.525      -0.792       0.405
City_Reducida[T.Noida]                                                 -0.4280      0.239     -1.791      0.073      -0.897       0.041
City_Reducida[T.Other]                                                 -0.6752      0.205     -3.296      0.001      -1.077      -0.274
City_Reducida[T.Pune]                                                  -0.3101      0.227     -1.369      0.171      -0.754       0.134
City_Reducida[T.Unknown]                                               -0.2056      0.238     -0.865      0.387      -0.672       0.261
City_Reducida[T.chennai]                                               -0.2019      0.360     -0.561      0.575      -0.907       0.503
City_Reducida[T.pune]                                                  -0.0529      0.111     -0.477      0.633      -0.270       0.164
Cuisine[T.algerian,belgian]                                            -0.2137      0.165     -1.292      0.196      -0.538       0.111
Cuisine[T.algerian,korean]                                             -0.2774      0.205     -1.352      0.176      -0.680       0.125
Cuisine[T.british,belgian]                                             -0.0729      0.217     -0.335      0.737      -0.499       0.353
Cuisine[T.british,japanese]                                            -0.2633      0.162     -1.628      0.104      -0.581       0.054
Cuisine[T.chinese,salvadorian]                                         -0.3488      0.184     -1.901      0.057      -0.709       0.011
Cuisine[T.cuban,british]                                               -0.2690      0.211     -1.274      0.203      -0.683       0.145
Cuisine[T.hawaiian,latvian]                                            -0.1843      0.305     -0.605      0.545      -0.782       0.413
Cuisine[T.indian,irish]                                                -0.1599      0.157     -1.019      0.308      -0.468       0.148
Cuisine[T.irish,belgian]                                               -0.5203      0.217     -2.393      0.017      -0.947      -0.094
Cuisine[T.japanese,thai]                                               -0.1400      0.170     -0.825      0.410      -0.473       0.193
Cuisine[T.nigerian,cajun]                                               0.4274      0.305      1.402      0.161      -0.171       1.025
Cuisine[T.peruvian,cuban]                                              -0.3606      0.186     -1.936      0.053      -0.726       0.005
Cuisine[T.polish,jewish]                                               -0.4425      0.177     -2.496      0.013      -0.790      -0.095
Cuisine[T.swedish,greek]                                               -0.4794      0.181     -2.646      0.008      -0.835      -0.124
Cuisine[T.tibetan,greek]                                               -0.1103      0.147     -0.752      0.452      -0.398       0.177
Cuisine[T.tibetan,italian]                                             -0.1407      0.159     -0.888      0.375      -0.452       0.170
Cuisine[T.turkish,nigerian]                                            -0.0941      0.160     -0.587      0.557      -0.408       0.220
Cuisine[T.turkish,sapnish]                                             -0.3372      0.178     -1.896      0.058      -0.686       0.011
Cuisine[T.welsh,thai]                                                  -0.1801      0.164     -1.101      0.271      -0.501       0.141
Resturant_Type[T.Buffet/Family Restaurant]                              0.2760      0.236      1.167      0.243      -0.188       0.740
Resturant_Type[T.Caffee]                                                4.2115      0.290     14.529      0.000       3.643       4.780
Resturant_Type[T.Gastro Bar]                                           -0.0648      0.038     -1.684      0.092      -0.140       0.011
Endorsed_By[T.Not Specific]                                            -0.2509      0.979     -0.256      0.798      -2.170       1.668
Endorsed_By[T.Tier A Celebrity]                                        -0.0006      0.980     -0.001      1.000      -1.922       1.921
City_Reducida[T.Bhubaneswar]:Cuisine[T.algerian,belgian]               -0.0423      0.681     -0.062      0.951      -1.378       1.294
City_Reducida[T.Chandigarh]:Cuisine[T.algerian,belgian]              4.189e-14   4.09e-13      0.102      0.919   -7.61e-13    8.45e-13
City_Reducida[T.Chennai]:Cuisine[T.algerian,belgian]                    0.5001      0.405      1.234      0.217      -0.295       1.295
City_Reducida[T.Greater Noida]:Cuisine[T.algerian,belgian]              0.3280      0.444      0.738      0.461      -0.543       1.199
City_Reducida[T.Gurgaon]:Cuisine[T.algerian,belgian]                    0.3519      0.290      1.215      0.225      -0.216       0.920
City_Reducida[T.Hyderabad]:Cuisine[T.algerian,belgian]                  0.1804      0.270      0.668      0.504      -0.349       0.710
City_Reducida[T.Indore]:Cuisine[T.algerian,belgian]                 -4.166e-14   1.71e-13     -0.244      0.807   -3.76e-13    2.93e-13
City_Reducida[T.Jaipur]:Cuisine[T.algerian,belgian]                    -0.2671      0.596     -0.448      0.654      -1.436       0.902
City_Reducida[T.Kolkata]:Cuisine[T.algerian,belgian]                    0.2645      0.279      0.947      0.344      -0.283       0.812
City_Reducida[T.Lucknow]:Cuisine[T.algerian,belgian]                    0.6394      0.682      0.938      0.349      -0.698       1.977
City_Reducida[T.Mumbai]:Cuisine[T.algerian,belgian]                     0.5217      0.272      1.915      0.056      -0.012       1.056
City_Reducida[T.Mysore]:Cuisine[T.algerian,belgian]                  1.743e-14   3.31e-14      0.527      0.598   -4.74e-14    8.23e-14
City_Reducida[T.Navi Mumbai]:Cuisine[T.algerian,belgian]               -1.0581      0.438     -2.416      0.016      -1.917      -0.199
City_Reducida[T.New Delhi]:Cuisine[T.algerian,belgian]                 -0.0401      0.380     -0.106      0.916      -0.784       0.704
City_Reducida[T.Noida]:Cuisine[T.algerian,belgian]                      0.3011      0.309      0.974      0.330      -0.305       0.907
City_Reducida[T.Other]:Cuisine[T.algerian,belgian]                      0.3170      0.234      1.356      0.175      -0.141       0.775
City_Reducida[T.Pune]:Cuisine[T.algerian,belgian]                       0.2124      0.320      0.664      0.507      -0.415       0.840
City_Reducida[T.Unknown]:Cuisine[T.algerian,belgian]                    0.0982      0.277      0.354      0.723      -0.445       0.642
City_Reducida[T.chennai]:Cuisine[T.algerian,belgian]                    0.2101      0.496      0.424      0.672      -0.762       1.182
City_Reducida[T.pune]:Cuisine[T.algerian,belgian]                      -0.3777      0.329     -1.150      0.250      -1.022       0.266
City_Reducida[T.Bhubaneswar]:Cuisine[T.algerian,korean]                 0.6277      0.692      0.907      0.364      -0.729       1.984
City_Reducida[T.Chandigarh]:Cuisine[T.algerian,korean]               2.331e-14   1.43e-13      0.163      0.870   -2.56e-13    3.03e-13
City_Reducida[T.Chennai]:Cuisine[T.algerian,korean]                     0.7648      0.442      1.732      0.083      -0.101       1.631
City_Reducida[T.Greater Noida]:Cuisine[T.algerian,korean]              -0.3074      0.346     -0.888      0.375      -0.986       0.371
City_Reducida[T.Gurgaon]:Cuisine[T.algerian,korean]                     0.2870      0.339      0.847      0.397      -0.378       0.952
City_Reducida[T.Hyderabad]:Cuisine[T.algerian,korean]                  -0.0769      0.291     -0.264      0.792      -0.647       0.494
City_Reducida[T.Indore]:Cuisine[T.algerian,korean]                  -1.735e-14   3.71e-14     -0.468      0.640   -9.01e-14    5.54e-14
City_Reducida[T.Jaipur]:Cuisine[T.algerian,korean]                   3.056e-14   1.61e-13      0.190      0.849   -2.85e-13    3.46e-13
City_Reducida[T.Kolkata]:Cuisine[T.algerian,korean]                    -0.0020      0.352     -0.006      0.996      -0.692       0.688
City_Reducida[T.Lucknow]:Cuisine[T.algerian,korean]                 -1.567e-14   3.49e-14     -0.449      0.653    -8.4e-14    5.27e-14
City_Reducida[T.Mumbai]:Cuisine[T.algerian,korean]                  -2.468e-14    2.3e-13     -0.107      0.915   -4.76e-13    4.27e-13
City_Reducida[T.Mysore]:Cuisine[T.algerian,korean]                     -0.2471      0.350     -0.707      0.480      -0.932       0.438
City_Reducida[T.Navi Mumbai]:Cuisine[T.algerian,korean]             -8.952e-14   3.61e-13     -0.248      0.804   -7.98e-13    6.19e-13
City_Reducida[T.New Delhi]:Cuisine[T.algerian,korean]                  -0.1058      0.434     -0.244      0.807      -0.956       0.745
City_Reducida[T.Noida]:Cuisine[T.algerian,korean]                       0.3798      0.322      1.181      0.238      -0.251       1.010
City_Reducida[T.Other]:Cuisine[T.algerian,korean]                       0.6354      0.277      2.294      0.022       0.092       1.179
City_Reducida[T.Pune]:Cuisine[T.algerian,korean]                       -0.2390      0.342     -0.699      0.485      -0.910       0.432
City_Reducida[T.Unknown]:Cuisine[T.algerian,korean]                    -0.3829      0.350     -1.093      0.274      -1.070       0.304
City_Reducida[T.chennai]:Cuisine[T.algerian,korean]                    -0.4977      0.613     -0.812      0.417      -1.700       0.705
City_Reducida[T.pune]:Cuisine[T.algerian,korean]                    -8.375e-14   3.16e-13     -0.265      0.791   -7.04e-13    5.37e-13
City_Reducida[T.Bhubaneswar]:Cuisine[T.british,belgian]                 0.4962      0.696      0.713      0.476      -0.869       1.861
City_Reducida[T.Chandigarh]:Cuisine[T.british,belgian]                2.06e-14   5.43e-14      0.379      0.704   -8.59e-14    1.27e-13
City_Reducida[T.Chennai]:Cuisine[T.british,belgian]                     0.7627      0.613      1.244      0.213      -0.439       1.964
City_Reducida[T.Greater Noida]:Cuisine[T.british,belgian]              -0.6308      0.463     -1.363      0.173      -1.538       0.277
City_Reducida[T.Gurgaon]:Cuisine[T.british,belgian]                    -0.1146      0.441     -0.260      0.795      -0.980       0.751
City_Reducida[T.Hyderabad]:Cuisine[T.british,belgian]                   0.0530      0.344      0.154      0.878      -0.622       0.728
City_Reducida[T.Indore]:Cuisine[T.british,belgian]                  -2.034e-14   9.09e-14     -0.224      0.823   -1.99e-13    1.58e-13
City_Reducida[T.Jaipur]:Cuisine[T.british,belgian]                  -8.145e-15   4.52e-14     -0.180      0.857   -9.67e-14    8.05e-14
City_Reducida[T.Kolkata]:Cuisine[T.british,belgian]                     0.3025      0.476      0.635      0.525      -0.632       1.237
City_Reducida[T.Lucknow]:Cuisine[T.british,belgian]                  9.587e-14   4.17e-13      0.230      0.818   -7.22e-13    9.13e-13
City_Reducida[T.Mumbai]:Cuisine[T.british,belgian]                    1.01e-14   7.47e-14      0.135      0.893   -1.36e-13    1.57e-13
City_Reducida[T.Mysore]:Cuisine[T.british,belgian]                   1.381e-14   1.04e-13      0.133      0.895    -1.9e-13    2.18e-13
City_Reducida[T.Navi Mumbai]:Cuisine[T.british,belgian]             -5.041e-14   2.06e-13     -0.245      0.807   -4.54e-13    3.53e-13
City_Reducida[T.New Delhi]:Cuisine[T.british,belgian]                  -0.4951      0.479     -1.034      0.301      -1.434       0.444
City_Reducida[T.Noida]:Cuisine[T.british,belgian]                       0.2549      0.373      0.684      0.494      -0.476       0.986
City_Reducida[T.Other]:Cuisine[T.british,belgian]                       0.2255      0.306      0.737      0.461      -0.374       0.825
City_Reducida[T.Pune]:Cuisine[T.british,belgian]                        0.2131      0.434      0.491      0.623      -0.637       1.063
City_Reducida[T.Unknown]:Cuisine[T.british,belgian]                     0.0450      0.372      0.121      0.904      -0.684       0.774
City_Reducida[T.chennai]:Cuisine[T.british,belgian]                  8.355e-14   3.51e-13      0.238      0.812   -6.05e-13    7.73e-13
City_Reducida[T.pune]:Cuisine[T.british,belgian]                    -1.202e-13   4.66e-13     -0.258      0.796   -1.03e-12    7.93e-13
City_Reducida[T.Bhubaneswar]:Cuisine[T.british,japanese]                0.7093      0.595      1.193      0.233      -0.457       1.875
City_Reducida[T.Chandigarh]:Cuisine[T.british,japanese]                 0.1787      0.683      0.262      0.794      -1.161       1.518
City_Reducida[T.Chennai]:Cuisine[T.british,japanese]                    0.8005      0.387      2.068      0.039       0.041       1.560
City_Reducida[T.Greater Noida]:Cuisine[T.british,japanese]          -4.372e-14   2.04e-13     -0.214      0.830   -4.44e-13    3.56e-13
City_Reducida[T.Gurgaon]:Cuisine[T.british,japanese]                    0.0805      0.271      0.297      0.767      -0.452       0.613
City_Reducida[T.Hyderabad]:Cuisine[T.british,japanese]                 -0.0508      0.253     -0.201      0.841      -0.546       0.444
City_Reducida[T.Indore]:Cuisine[T.british,japanese]                     0.4944      0.330      1.497      0.134      -0.153       1.142
City_Reducida[T.Jaipur]:Cuisine[T.british,japanese]                     0.5311      0.436      1.218      0.223      -0.324       1.386
City_Reducida[T.Kolkata]:Cuisine[T.british,japanese]                   -0.2012      0.246     -0.816      0.414      -0.684       0.282
City_Reducida[T.Lucknow]:Cuisine[T.british,japanese]                   -0.1542      0.682     -0.226      0.821      -1.491       1.182
City_Reducida[T.Mumbai]:Cuisine[T.british,japanese]                    -0.0692      0.164     -0.422      0.673      -0.391       0.253
City_Reducida[T.Mysore]:Cuisine[T.british,japanese]                    -0.1817      0.329     -0.552      0.581      -0.827       0.464
City_Reducida[T.Navi Mumbai]:Cuisine[T.british,japanese]            -5.839e-14    1.7e-13     -0.344      0.731   -3.91e-13    2.74e-13
City_Reducida[T.New Delhi]:Cuisine[T.british,japanese]                 -0.0620      0.356     -0.174      0.862      -0.759       0.635
City_Reducida[T.Noida]:Cuisine[T.british,japanese]                      0.3119      0.265      1.178      0.239      -0.207       0.831
City_Reducida[T.Other]:Cuisine[T.british,japanese]                      0.5672      0.230      2.462      0.014       0.115       1.019
City_Reducida[T.Pune]:Cuisine[T.british,japanese]                       0.4850      0.258      1.881      0.060      -0.021       0.991
City_Reducida[T.Unknown]:Cuisine[T.british,japanese]                    0.1698      0.263      0.646      0.518      -0.345       0.685
City_Reducida[T.chennai]:Cuisine[T.british,japanese]                   -0.1147      0.495     -0.232      0.817      -1.085       0.855
City_Reducida[T.pune]:Cuisine[T.british,japanese]                    2.346e-14    5.1e-14      0.460      0.646   -7.66e-14    1.23e-13
City_Reducida[T.Bhubaneswar]:Cuisine[T.chinese,salvadorian]             0.7321      0.601      1.219      0.223      -0.446       1.910
City_Reducida[T.Chandigarh]:Cuisine[T.chinese,salvadorian]           6.396e-15   6.15e-14      0.104      0.917   -1.14e-13    1.27e-13
City_Reducida[T.Chennai]:Cuisine[T.chinese,salvadorian]                 0.5774      0.412      1.400      0.162      -0.231       1.386
City_Reducida[T.Greater Noida]:Cuisine[T.chinese,salvadorian]          -0.0645      0.336     -0.192      0.848      -0.724       0.595
City_Reducida[T.Gurgaon]:Cuisine[T.chinese,salvadorian]                 0.3541      0.353      1.002      0.316      -0.339       1.047
City_Reducida[T.Hyderabad]:Cuisine[T.chinese,salvadorian]               0.4812      0.282      1.709      0.088      -0.071       1.033
City_Reducida[T.Indore]:Cuisine[T.chinese,salvadorian]                  0.0605      0.339      0.179      0.858      -0.604       0.725
City_Reducida[T.Jaipur]:Cuisine[T.chinese,salvadorian]               8.439e-15   3.25e-14      0.260      0.795   -5.53e-14    7.22e-14
City_Reducida[T.Kolkata]:Cuisine[T.chinese,salvadorian]                -0.0335      0.341     -0.098      0.922      -0.703       0.636
City_Reducida[T.Lucknow]:Cuisine[T.chinese,salvadorian]              4.074e-14   1.42e-13      0.287      0.774   -2.37e-13    3.19e-13
City_Reducida[T.Mumbai]:Cuisine[T.chinese,salvadorian]                  0.1587      0.333      0.477      0.633      -0.494       0.811
City_Reducida[T.Mysore]:Cuisine[T.chinese,salvadorian]                  0.3954      0.453      0.874      0.382      -0.492       1.283
City_Reducida[T.Navi Mumbai]:Cuisine[T.chinese,salvadorian]          4.389e-14    2.1e-13      0.209      0.834   -3.68e-13    4.56e-13
City_Reducida[T.New Delhi]:Cuisine[T.chinese,salvadorian]              -0.0937      0.354     -0.265      0.791      -0.787       0.600
City_Reducida[T.Noida]:Cuisine[T.chinese,salvadorian]                   0.0815      0.304      0.268      0.789      -0.515       0.678
City_Reducida[T.Other]:Cuisine[T.chinese,salvadorian]                   0.5402      0.246      2.194      0.028       0.057       1.023
City_Reducida[T.Pune]:Cuisine[T.chinese,salvadorian]                    0.2159      0.294      0.733      0.464      -0.361       0.793
City_Reducida[T.Unknown]:Cuisine[T.chinese,salvadorian]                -0.1055      0.284     -0.371      0.710      -0.662       0.451
City_Reducida[T.chennai]:Cuisine[T.chinese,salvadorian]                 0.2602      0.601      0.433      0.665      -0.918       1.438
City_Reducida[T.pune]:Cuisine[T.chinese,salvadorian]                    0.1071      0.450      0.238      0.812      -0.775       0.989
City_Reducida[T.Bhubaneswar]:Cuisine[T.cuban,british]                2.075e-14   5.91e-14      0.351      0.725    -9.5e-14    1.37e-13
City_Reducida[T.Chandigarh]:Cuisine[T.cuban,british]                 6.523e-15   3.64e-14      0.179      0.858   -6.49e-14     7.8e-14
City_Reducida[T.Chennai]:Cuisine[T.cuban,british]                       0.7963      0.430      1.852      0.064      -0.047       1.639
City_Reducida[T.Greater Noida]:Cuisine[T.cuban,british]              3.801e-14    1.2e-13      0.317      0.751   -1.97e-13    2.73e-13
City_Reducida[T.Gurgaon]:Cuisine[T.cuban,british]                       0.1350      0.329      0.410      0.682      -0.510       0.780
City_Reducida[T.Hyderabad]:Cuisine[T.cuban,british]                    -0.1298      0.300     -0.432      0.666      -0.719       0.459
City_Reducida[T.Indore]:Cuisine[T.cuban,british]                       -0.0734      0.468     -0.157      0.875      -0.991       0.845
City_Reducida[T.Jaipur]:Cuisine[T.cuban,british]                    -7.073e-16   3.37e-14     -0.021      0.983   -6.69e-14    6.54e-14
City_Reducida[T.Kolkata]:Cuisine[T.cuban,british]                      -0.2682      0.474     -0.566      0.571      -1.197       0.660
City_Reducida[T.Lucknow]:Cuisine[T.cuban,british]                       1.5153      0.695      2.181      0.029       0.153       2.878
City_Reducida[T.Mumbai]:Cuisine[T.cuban,british]                     1.426e-14   4.07e-14      0.351      0.726   -6.55e-14     9.4e-14
City_Reducida[T.Mysore]:Cuisine[T.cuban,british]                       -0.4208      0.462     -0.911      0.362      -1.326       0.485
City_Reducida[T.Navi Mumbai]:Cuisine[T.cuban,british]               -3.182e-15   4.44e-14     -0.072      0.943   -9.03e-14    8.39e-14
City_Reducida[T.New Delhi]:Cuisine[T.cuban,british]                     0.1334      0.476      0.280      0.780      -0.801       1.067
City_Reducida[T.Noida]:Cuisine[T.cuban,british]                         0.0914      0.330      0.277      0.782      -0.556       0.739
City_Reducida[T.Other]:Cuisine[T.cuban,british]                         0.4720      0.291      1.620      0.105      -0.099       1.043
City_Reducida[T.Pune]:Cuisine[T.cuban,british]                          0.0820      0.361      0.227      0.820      -0.626       0.790
City_Reducida[T.Unknown]:Cuisine[T.cuban,british]                       0.1675      0.311      0.538      0.591      -0.443       0.778
City_Reducida[T.chennai]:Cuisine[T.cuban,british]                    4.262e-14   1.32e-13      0.322      0.747   -2.17e-13    3.02e-13
City_Reducida[T.pune]:Cuisine[T.cuban,british]                        -4.7e-15    5.7e-14     -0.082      0.934   -1.17e-13    1.07e-13
City_Reducida[T.Bhubaneswar]:Cuisine[T.hawaiian,latvian]             6.557e-14   2.59e-13      0.254      0.800   -4.41e-13    5.73e-13
City_Reducida[T.Chandigarh]:Cuisine[T.hawaiian,latvian]                 0.5521      0.734      0.752      0.452      -0.887       1.992
City_Reducida[T.Chennai]:Cuisine[T.hawaiian,latvian]                    0.6161      0.558      1.103      0.270      -0.479       1.711
City_Reducida[T.Greater Noida]:Cuisine[T.hawaiian,latvian]           4.889e-14   1.89e-13      0.259      0.796   -3.21e-13    4.19e-13
City_Reducida[T.Gurgaon]:Cuisine[T.hawaiian,latvian]                    0.3360      0.489      0.687      0.492      -0.623       1.295
City_Reducida[T.Hyderabad]:Cuisine[T.hawaiian,latvian]               4.218e-05      0.440   9.59e-05      1.000      -0.862       0.862
City_Reducida[T.Indore]:Cuisine[T.hawaiian,latvian]                  5.517e-14   2.49e-13      0.221      0.825   -4.34e-13    5.44e-13
City_Reducida[T.Jaipur]:Cuisine[T.hawaiian,latvian]                     0.4795      0.562      0.853      0.394      -0.623       1.582
City_Reducida[T.Kolkata]:Cuisine[T.hawaiian,latvian]                 8.705e-14   3.55e-13      0.245      0.806   -6.08e-13    7.83e-13
City_Reducida[T.Lucknow]:Cuisine[T.hawaiian,latvian]                -4.566e-14   2.11e-13     -0.216      0.829    -4.6e-13    3.69e-13
City_Reducida[T.Mumbai]:Cuisine[T.hawaiian,latvian]                  5.416e-15    4.3e-14      0.126      0.900   -7.88e-14    8.97e-14
City_Reducida[T.Mysore]:Cuisine[T.hawaiian,latvian]                  5.461e-14   2.32e-13      0.236      0.814      -4e-13    5.09e-13
City_Reducida[T.Navi Mumbai]:Cuisine[T.hawaiian,latvian]             2.623e-15   7.77e-15      0.338      0.736   -1.26e-14    1.79e-14
City_Reducida[T.New Delhi]:Cuisine[T.hawaiian,latvian]                 -0.1390      0.450     -0.309      0.757      -1.021       0.743
City_Reducida[T.Noida]:Cuisine[T.hawaiian,latvian]                     -0.2696      0.489     -0.552      0.581      -1.228       0.689
City_Reducida[T.Other]:Cuisine[T.hawaiian,latvian]                      0.9903      0.370      2.679      0.007       0.265       1.715
City_Reducida[T.Pune]:Cuisine[T.hawaiian,latvian]                      -0.0543      0.444     -0.122      0.903      -0.925       0.816
City_Reducida[T.Unknown]:Cuisine[T.hawaiian,latvian]                   -0.0284      0.387     -0.074      0.941      -0.787       0.730
City_Reducida[T.chennai]:Cuisine[T.hawaiian,latvian]                -2.798e-14   1.38e-13     -0.202      0.840   -2.99e-13    2.43e-13
City_Reducida[T.pune]:Cuisine[T.hawaiian,latvian]                    -2.39e-14   1.14e-13     -0.209      0.834   -2.48e-13       2e-13
City_Reducida[T.Bhubaneswar]:Cuisine[T.indian,irish]                 1.406e-14   2.65e-14      0.530      0.596    -3.8e-14    6.61e-14
City_Reducida[T.Chandigarh]:Cuisine[T.indian,irish]                  9.366e-15   1.62e-14      0.578      0.563   -2.24e-14    4.11e-14
City_Reducida[T.Chennai]:Cuisine[T.indian,irish]                        0.5230      0.381      1.374      0.170      -0.224       1.269
City_Reducida[T.Greater Noida]:Cuisine[T.indian,irish]                  0.7336      0.445      1.647      0.100      -0.140       1.607
City_Reducida[T.Gurgaon]:Cuisine[T.indian,irish]                        0.0662      0.277      0.239      0.811      -0.477       0.609
City_Reducida[T.Hyderabad]:Cuisine[T.indian,irish]                      0.1934      0.253      0.764      0.445      -0.303       0.690
City_Reducida[T.Indore]:Cuisine[T.indian,irish]                         0.2667      0.328      0.814      0.416      -0.376       0.909
City_Reducida[T.Jaipur]:Cuisine[T.indian,irish]                         0.7280      0.595      1.225      0.221      -0.438       1.894
City_Reducida[T.Kolkata]:Cuisine[T.indian,irish]                       -0.0047      0.180     -0.026      0.979      -0.357       0.348
City_Reducida[T.Lucknow]:Cuisine[T.indian,irish]                        0.3662      0.681      0.538      0.591      -0.968       1.701
City_Reducida[T.Mumbai]:Cuisine[T.indian,irish]                        -0.0846      0.170     -0.499      0.618      -0.417       0.248
City_Reducida[T.Mysore]:Cuisine[T.indian,irish]                       8.55e-14   3.12e-13      0.274      0.784   -5.27e-13    6.98e-13
City_Reducida[T.Navi Mumbai]:Cuisine[T.indian,irish]                    0.1850      0.436      0.425      0.671      -0.669       1.039
City_Reducida[T.New Delhi]:Cuisine[T.indian,irish]                     -0.0784      0.334     -0.235      0.814      -0.733       0.576
City_Reducida[T.Noida]:Cuisine[T.indian,irish]                          0.2614      0.269      0.973      0.330      -0.265       0.788
City_Reducida[T.Other]:Cuisine[T.indian,irish]                          0.4838      0.224      2.164      0.031       0.046       0.922
City_Reducida[T.Pune]:Cuisine[T.indian,irish]                          -0.0785      0.279     -0.281      0.778      -0.626       0.469
City_Reducida[T.Unknown]:Cuisine[T.indian,irish]                        0.0778      0.265      0.294      0.769      -0.441       0.597
City_Reducida[T.chennai]:Cuisine[T.indian,irish]                       -0.5100      0.493     -1.034      0.301      -1.477       0.457
City_Reducida[T.pune]:Cuisine[T.indian,irish]                           0.6676      0.441      1.513      0.130      -0.198       1.533
City_Reducida[T.Bhubaneswar]:Cuisine[T.irish,belgian]               -2.785e-14    1.1e-13     -0.254      0.800   -2.43e-13    1.87e-13
City_Reducida[T.Chandigarh]:Cuisine[T.irish,belgian]                    0.3054      0.613      0.498      0.618      -0.896       1.507
City_Reducida[T.Chennai]:Cuisine[T.irish,belgian]                       0.8272      0.516      1.603      0.109      -0.184       1.839
City_Reducida[T.Greater Noida]:Cuisine[T.irish,belgian]                 0.3554      0.462      0.769      0.442      -0.550       1.261
City_Reducida[T.Gurgaon]:Cuisine[T.irish,belgian]                       0.4510      0.372      1.212      0.225      -0.278       1.180
City_Reducida[T.Hyderabad]:Cuisine[T.irish,belgian]                     0.1574      0.344      0.457      0.648      -0.518       0.833
City_Reducida[T.Indore]:Cuisine[T.irish,belgian]                        0.5017      0.284      1.765      0.078      -0.056       1.059
City_Reducida[T.Jaipur]:Cuisine[T.irish,belgian]                        1.3143      0.612      2.147      0.032       0.114       2.515
City_Reducida[T.Kolkata]:Cuisine[T.irish,belgian]                      -0.3254      0.476     -0.684      0.494      -1.258       0.607
City_Reducida[T.Lucknow]:Cuisine[T.irish,belgian]                       1.2671      0.583      2.172      0.030       0.123       2.411
City_Reducida[T.Mumbai]:Cuisine[T.irish,belgian]                       -0.1438      0.463     -0.310      0.756      -1.052       0.764
City_Reducida[T.Mysore]:Cuisine[T.irish,belgian]                    -4.571e-14   1.86e-13     -0.246      0.806    -4.1e-13    3.19e-13
City_Reducida[T.Navi Mumbai]:Cuisine[T.irish,belgian]                8.453e-15   2.51e-14      0.336      0.737   -4.08e-14    5.77e-14
City_Reducida[T.New Delhi]:Cuisine[T.irish,belgian]                    -0.3364      0.390     -0.863      0.388      -1.100       0.428
City_Reducida[T.Noida]:Cuisine[T.irish,belgian]                         0.0890      0.329      0.270      0.787      -0.557       0.735
City_Reducida[T.Other]:Cuisine[T.irish,belgian]                         0.5545      0.282      1.965      0.050       0.001       1.108
City_Reducida[T.Pune]:Cuisine[T.irish,belgian]                          0.2632      0.365      0.721      0.471      -0.453       0.979
City_Reducida[T.Unknown]:Cuisine[T.irish,belgian]                      -0.2224      0.329     -0.677      0.499      -0.867       0.422
City_Reducida[T.chennai]:Cuisine[T.irish,belgian]                   -1.155e-14   6.75e-14     -0.171      0.864   -1.44e-13    1.21e-13
City_Reducida[T.pune]:Cuisine[T.irish,belgian]                        -2.4e-14   9.23e-14     -0.260      0.795   -2.05e-13    1.57e-13
City_Reducida[T.Bhubaneswar]:Cuisine[T.japanese,thai]                   0.1702      0.597      0.285      0.776      -1.001       1.341
City_Reducida[T.Chandigarh]:Cuisine[T.japanese,thai]                 6.798e-14   2.64e-13      0.257      0.797    -4.5e-13    5.86e-13
City_Reducida[T.Chennai]:Cuisine[T.japanese,thai]                       0.8066      0.379      2.128      0.033       0.063       1.550
City_Reducida[T.Greater Noida]:Cuisine[T.japanese,thai]                 0.0968      0.330      0.293      0.769      -0.551       0.745
City_Reducida[T.Gurgaon]:Cuisine[T.japanese,thai]                       0.2065      0.311      0.665      0.506      -0.402       0.816
City_Reducida[T.Hyderabad]:Cuisine[T.japanese,thai]                     0.1501      0.261      0.575      0.565      -0.362       0.662
City_Reducida[T.Indore]:Cuisine[T.japanese,thai]                       -0.2221      0.450     -0.493      0.622      -1.105       0.661
City_Reducida[T.Jaipur]:Cuisine[T.japanese,thai]                        0.2484      0.497      0.500      0.617      -0.726       1.223
City_Reducida[T.Kolkata]:Cuisine[T.japanese,thai]                       0.3846      0.175      2.199      0.028       0.042       0.728
City_Reducida[T.Lucknow]:Cuisine[T.japanese,thai]                    3.176e-14   1.44e-13      0.220      0.826   -2.51e-13    3.15e-13
City_Reducida[T.Mumbai]:Cuisine[T.japanese,thai]                    -4.954e-14   2.49e-13     -0.199      0.843   -5.38e-13    4.39e-13
City_Reducida[T.Mysore]:Cuisine[T.japanese,thai]                        0.0337      0.449      0.075      0.940      -0.846       0.913
City_Reducida[T.Navi Mumbai]:Cuisine[T.japanese,thai]                1.143e-13   4.95e-13      0.231      0.817   -8.56e-13    1.08e-12
City_Reducida[T.New Delhi]:Cuisine[T.japanese,thai]                     0.3493      0.381      0.916      0.360      -0.399       1.097
City_Reducida[T.Noida]:Cuisine[T.japanese,thai]                         0.2579      0.312      0.828      0.408      -0.353       0.869
City_Reducida[T.Other]:Cuisine[T.japanese,thai]                         0.4513      0.243      1.859      0.063      -0.025       0.927
City_Reducida[T.Pune]:Cuisine[T.japanese,thai]                          0.2966      0.261      1.137      0.256      -0.215       0.808
City_Reducida[T.Unknown]:Cuisine[T.japanese,thai]                       0.2540      0.276      0.920      0.358      -0.287       0.795
City_Reducida[T.chennai]:Cuisine[T.japanese,thai]                    1.526e-14   6.63e-14      0.230      0.818   -1.15e-13    1.45e-13
City_Reducida[T.pune]:Cuisine[T.japanese,thai]                         -0.1086      0.445     -0.244      0.807      -0.982       0.764
City_Reducida[T.Bhubaneswar]:Cuisine[T.nigerian,cajun]               8.801e-15   4.12e-14      0.214      0.831    -7.2e-14    8.96e-14
City_Reducida[T.Chandigarh]:Cuisine[T.nigerian,cajun]                 1.22e-14   5.98e-14      0.204      0.838   -1.05e-13     1.3e-13
City_Reducida[T.Chennai]:Cuisine[T.nigerian,cajun]                     -0.1294      0.649     -0.199      0.842      -1.402       1.143
City_Reducida[T.Greater Noida]:Cuisine[T.nigerian,cajun]             -1.61e-14   8.52e-14     -0.189      0.850   -1.83e-13    1.51e-13
City_Reducida[T.Gurgaon]:Cuisine[T.nigerian,cajun]                     -0.7044      0.450     -1.565      0.118      -1.587       0.178
City_Reducida[T.Hyderabad]:Cuisine[T.nigerian,cajun]                   -0.1356      0.582     -0.233      0.816      -1.276       1.005
City_Reducida[T.Indore]:Cuisine[T.nigerian,cajun]                    2.487e-14   1.11e-13      0.225      0.822   -1.92e-13    2.42e-13
City_Reducida[T.Jaipur]:Cuisine[T.nigerian,cajun]                   -1.196e-14   3.17e-14     -0.377      0.706   -7.41e-14    5.02e-14
City_Reducida[T.Kolkata]:Cuisine[T.nigerian,cajun]                     -0.5809      0.411     -1.412      0.158      -1.388       0.226
City_Reducida[T.Lucknow]:Cuisine[T.nigerian,cajun]                     -0.5160      0.729     -0.708      0.479      -1.945       0.913
City_Reducida[T.Mumbai]:Cuisine[T.nigerian,cajun]                      -0.2437      0.503     -0.484      0.628      -1.230       0.743
City_Reducida[T.Mysore]:Cuisine[T.nigerian,cajun]                    8.915e-15   1.85e-14      0.481      0.630   -2.74e-14    4.52e-14
City_Reducida[T.Navi Mumbai]:Cuisine[T.nigerian,cajun]              -3.003e-16   9.29e-15     -0.032      0.974   -1.85e-14    1.79e-14
City_Reducida[T.New Delhi]:Cuisine[T.nigerian,cajun]                 1.089e-14   2.15e-14      0.507      0.612   -3.12e-14     5.3e-14
City_Reducida[T.Noida]:Cuisine[T.nigerian,cajun]                       -0.5188      0.489     -1.061      0.289      -1.478       0.440
City_Reducida[T.Other]:Cuisine[T.nigerian,cajun]                       -0.3034      0.389     -0.780      0.435      -1.066       0.459
City_Reducida[T.Pune]:Cuisine[T.nigerian,cajun]                        -0.0243      0.585     -0.041      0.967      -1.172       1.123
City_Reducida[T.Unknown]:Cuisine[T.nigerian,cajun]                   8.643e-15   2.36e-14      0.366      0.714   -3.76e-14    5.49e-14
City_Reducida[T.chennai]:Cuisine[T.nigerian,cajun]                     -0.7101      0.558     -1.272      0.204      -1.805       0.385
City_Reducida[T.pune]:Cuisine[T.nigerian,cajun]                      7.619e-15   3.54e-14      0.215      0.830   -6.18e-14    7.71e-14
City_Reducida[T.Bhubaneswar]:Cuisine[T.peruvian,cuban]               5.504e-15   1.49e-14      0.370      0.711   -2.37e-14    3.47e-14
City_Reducida[T.Chandigarh]:Cuisine[T.peruvian,cuban]               -7.601e-15   2.36e-14     -0.322      0.748   -5.39e-14    3.87e-14
City_Reducida[T.Chennai]:Cuisine[T.peruvian,cuban]                      0.6757      0.425      1.592      0.112      -0.157       1.508
City_Reducida[T.Greater Noida]:Cuisine[T.peruvian,cuban]             2.796e-14   1.21e-13      0.232      0.817   -2.09e-13    2.65e-13
City_Reducida[T.Gurgaon]:Cuisine[T.peruvian,cuban]                      0.1233      0.320      0.385      0.700      -0.504       0.751
City_Reducida[T.Hyderabad]:Cuisine[T.peruvian,cuban]                    0.0617      0.281      0.220      0.826      -0.489       0.613
City_Reducida[T.Indore]:Cuisine[T.peruvian,cuban]                      7.6e-15   4.72e-14      0.161      0.872   -8.49e-14       1e-13
City_Reducida[T.Jaipur]:Cuisine[T.peruvian,cuban]                       0.6834      0.446      1.533      0.125      -0.191       1.557
City_Reducida[T.Kolkata]:Cuisine[T.peruvian,cuban]                      0.1957      0.291      0.673      0.501      -0.375       0.766
City_Reducida[T.Lucknow]:Cuisine[T.peruvian,cuban]                  -1.812e-14   6.85e-14     -0.265      0.791   -1.52e-13    1.16e-13
City_Reducida[T.Mumbai]:Cuisine[T.peruvian,cuban]                   -1.051e-14   2.13e-14     -0.493      0.622   -5.23e-14    3.13e-14
City_Reducida[T.Mysore]:Cuisine[T.peruvian,cuban]                       0.2028      0.341      0.595      0.552      -0.466       0.872
City_Reducida[T.Navi Mumbai]:Cuisine[T.peruvian,cuban]              -2.239e-14   9.16e-14     -0.244      0.807   -2.02e-13    1.57e-13
City_Reducida[T.New Delhi]:Cuisine[T.peruvian,cuban]                   -0.3646      0.425     -0.858      0.391      -1.198       0.469
City_Reducida[T.Noida]:Cuisine[T.peruvian,cuban]                        0.3946      0.298      1.326      0.185      -0.189       0.978
City_Reducida[T.Other]:Cuisine[T.peruvian,cuban]                        0.6468      0.260      2.490      0.013       0.138       1.156
City_Reducida[T.Pune]:Cuisine[T.peruvian,cuban]                         0.6701      0.320      2.094      0.036       0.043       1.298
City_Reducida[T.Unknown]:Cuisine[T.peruvian,cuban]                     -0.0299      0.295     -0.101      0.919      -0.608       0.548
City_Reducida[T.chennai]:Cuisine[T.peruvian,cuban]                    -1.4e-14   5.14e-14     -0.272      0.785   -1.15e-13    8.68e-14
City_Reducida[T.pune]:Cuisine[T.peruvian,cuban]                        -0.1285      0.245     -0.524      0.600      -0.609       0.352
City_Reducida[T.Bhubaneswar]:Cuisine[T.polish,jewish]                   0.5767      0.684      0.843      0.399      -0.765       1.919
City_Reducida[T.Chandigarh]:Cuisine[T.polish,jewish]                -1.433e-15   6.08e-15     -0.236      0.814   -1.34e-14    1.05e-14
City_Reducida[T.Chennai]:Cuisine[T.polish,jewish]                       0.9452      0.403      2.344      0.019       0.155       1.736
City_Reducida[T.Greater Noida]:Cuisine[T.polish,jewish]                -0.2704      0.285     -0.947      0.344      -0.830       0.289
City_Reducida[T.Gurgaon]:Cuisine[T.polish,jewish]                       0.3297      0.350      0.942      0.346      -0.357       1.016
City_Reducida[T.Hyderabad]:Cuisine[T.polish,jewish]                     0.5712      0.284      2.009      0.045       0.014       1.129
City_Reducida[T.Indore]:Cuisine[T.polish,jewish]                        0.7631      0.336      2.269      0.023       0.104       1.423
City_Reducida[T.Jaipur]:Cuisine[T.polish,jewish]                        0.0807      0.599      0.135      0.893      -1.094       1.256
City_Reducida[T.Kolkata]:Cuisine[T.polish,jewish]                      -0.2692      0.461     -0.584      0.559      -1.173       0.634
City_Reducida[T.Lucknow]:Cuisine[T.polish,jewish]                    3.002e-14   1.17e-13      0.256      0.798      -2e-13     2.6e-13
City_Reducida[T.Mumbai]:Cuisine[T.polish,jewish]                        0.1730      0.279      0.620      0.535      -0.374       0.720
City_Reducida[T.Mysore]:Cuisine[T.polish,jewish]                    -3.307e-14   1.48e-13     -0.223      0.824   -3.24e-13    2.58e-13
City_Reducida[T.Navi Mumbai]:Cuisine[T.polish,jewish]                   0.4170      0.442      0.942      0.346      -0.450       1.284
City_Reducida[T.New Delhi]:Cuisine[T.polish,jewish]                     0.2879      0.462      0.623      0.534      -0.619       1.195
City_Reducida[T.Noida]:Cuisine[T.polish,jewish]                         0.6861      0.301      2.283      0.023       0.097       1.276
City_Reducida[T.Other]:Cuisine[T.polish,jewish]                         0.7280      0.253      2.878      0.004       0.232       1.224
City_Reducida[T.Pune]:Cuisine[T.polish,jewish]                          0.6076      0.287      2.115      0.035       0.044       1.171
City_Reducida[T.Unknown]:Cuisine[T.polish,jewish]                       0.3823      0.287      1.330      0.184      -0.181       0.946
City_Reducida[T.chennai]:Cuisine[T.polish,jewish]                    -4.87e-15   2.29e-14     -0.213      0.831   -4.97e-14       4e-14
City_Reducida[T.pune]:Cuisine[T.polish,jewish]                       9.075e-15   3.73e-14      0.243      0.808    -6.4e-14    8.22e-14
City_Reducida[T.Bhubaneswar]:Cuisine[T.swedish,greek]               -5.043e-15   9.08e-15     -0.556      0.579   -2.28e-14    1.28e-14
City_Reducida[T.Chandigarh]:Cuisine[T.swedish,greek]                    0.8827      0.685      1.288      0.198      -0.461       2.226
City_Reducida[T.Chennai]:Cuisine[T.swedish,greek]                       0.8813      0.444      1.986      0.047       0.011       1.751
City_Reducida[T.Greater Noida]:Cuisine[T.swedish,greek]              2.201e-15   6.82e-15      0.323      0.747   -1.12e-14    1.56e-14
City_Reducida[T.Gurgaon]:Cuisine[T.swedish,greek]                       0.4497      0.377      1.192      0.233      -0.290       1.189
City_Reducida[T.Hyderabad]:Cuisine[T.swedish,greek]                     0.2554      0.287      0.891      0.373      -0.307       0.818
City_Reducida[T.Indore]:Cuisine[T.swedish,greek]                     2.461e-15   1.23e-14      0.200      0.841   -2.16e-14    2.65e-14
City_Reducida[T.Jaipur]:Cuisine[T.swedish,greek]                    -3.222e-15   8.79e-15     -0.366      0.714   -2.05e-14     1.4e-14
City_Reducida[T.Kolkata]:Cuisine[T.swedish,greek]                      -0.3075      0.463     -0.665      0.506      -1.214       0.599
City_Reducida[T.Lucknow]:Cuisine[T.swedish,greek]                      -0.1030      0.571     -0.181      0.857      -1.222       1.016
City_Reducida[T.Mumbai]:Cuisine[T.swedish,greek]                    -9.956e-15   4.19e-14     -0.238      0.812   -9.21e-14    7.22e-14
City_Reducida[T.Mysore]:Cuisine[T.swedish,greek]                        0.5681      0.336      1.690      0.091      -0.091       1.227
City_Reducida[T.Navi Mumbai]:Cuisine[T.swedish,greek]                   0.2408      0.334      0.721      0.471      -0.414       0.895
City_Reducida[T.New Delhi]:Cuisine[T.swedish,greek]                     0.2569      0.401      0.641      0.521      -0.529       1.042
City_Reducida[T.Noida]:Cuisine[T.swedish,greek]                         0.7214      0.312      2.314      0.021       0.110       1.333
City_Reducida[T.Other]:Cuisine[T.swedish,greek]                         0.6282      0.260      2.415      0.016       0.118       1.138
City_Reducida[T.Pune]:Cuisine[T.swedish,greek]                          0.1664      0.309      0.539      0.590      -0.439       0.771
City_Reducida[T.Unknown]:Cuisine[T.swedish,greek]                       0.2262      0.302      0.749      0.454      -0.366       0.818
City_Reducida[T.chennai]:Cuisine[T.swedish,greek]                      -0.6798      0.600     -1.132      0.258      -1.857       0.498
City_Reducida[T.pune]:Cuisine[T.swedish,greek]                          0.5470      0.449      1.219      0.223      -0.333       1.427
City_Reducida[T.Bhubaneswar]:Cuisine[T.tibetan,greek]                   0.9629      0.543      1.773      0.076      -0.102       2.028
City_Reducida[T.Chandigarh]:Cuisine[T.tibetan,greek]                    0.1806      0.521      0.347      0.729      -0.841       1.202
City_Reducida[T.Chennai]:Cuisine[T.tibetan,greek]                       0.7271      0.369      1.970      0.049       0.003       1.451
City_Reducida[T.Greater Noida]:Cuisine[T.tibetan,greek]                -0.0923      0.208     -0.443      0.658      -0.501       0.316
City_Reducida[T.Gurgaon]:Cuisine[T.tibetan,greek]                       0.1584      0.256      0.618      0.536      -0.344       0.661
City_Reducida[T.Hyderabad]:Cuisine[T.tibetan,greek]                     0.1246      0.230      0.541      0.588      -0.327       0.576
City_Reducida[T.Indore]:Cuisine[T.tibetan,greek]                    -1.345e-16   1.54e-15     -0.088      0.930   -3.14e-15    2.88e-15
City_Reducida[T.Jaipur]:Cuisine[T.tibetan,greek]                        0.1351      0.452      0.299      0.765      -0.750       1.021
City_Reducida[T.Kolkata]:Cuisine[T.tibetan,greek]                       0.2050      0.139      1.478      0.140      -0.067       0.477
City_Reducida[T.Lucknow]:Cuisine[T.tibetan,greek]                      -0.0379      0.561     -0.068      0.946      -1.137       1.061
City_Reducida[T.Mumbai]:Cuisine[T.tibetan,greek]                       -0.1712      0.117     -1.457      0.145      -0.402       0.059
City_Reducida[T.Mysore]:Cuisine[T.tibetan,greek]                        0.0981      0.175      0.562      0.574      -0.245       0.441
City_Reducida[T.Navi Mumbai]:Cuisine[T.tibetan,greek]                   0.0417      0.191      0.219      0.827      -0.332       0.416
City_Reducida[T.New Delhi]:Cuisine[T.tibetan,greek]                    -0.0077      0.322     -0.024      0.981      -0.639       0.624
City_Reducida[T.Noida]:Cuisine[T.tibetan,greek]                         0.2035      0.251      0.811      0.418      -0.289       0.696
City_Reducida[T.Other]:Cuisine[T.tibetan,greek]                         0.4329      0.216      2.009      0.045       0.010       0.855
City_Reducida[T.Pune]:Cuisine[T.tibetan,greek]                          0.2017      0.237      0.850      0.395      -0.263       0.667
City_Reducida[T.Unknown]:Cuisine[T.tibetan,greek]                       0.1075      0.247      0.435      0.663      -0.377       0.592
City_Reducida[T.chennai]:Cuisine[T.tibetan,greek]                       0.2226      0.409      0.544      0.586      -0.579       1.024
City_Reducida[T.pune]:Cuisine[T.tibetan,greek]                          0.1095      0.222      0.493      0.622      -0.326       0.545
City_Reducida[T.Bhubaneswar]:Cuisine[T.tibetan,italian]                 1.1754      0.594      1.978      0.048       0.010       2.340
City_Reducida[T.Chandigarh]:Cuisine[T.tibetan,italian]                  0.2479      0.594      0.418      0.676      -0.916       1.412
City_Reducida[T.Chennai]:Cuisine[T.tibetan,italian]                     0.7369      0.381      1.936      0.053      -0.010       1.483
City_Reducida[T.Greater Noida]:Cuisine[T.tibetan,italian]                    0          0        nan        nan           0           0
City_Reducida[T.Gurgaon]:Cuisine[T.tibetan,italian]                    -0.0401      0.286     -0.140      0.888      -0.600       0.520
City_Reducida[T.Hyderabad]:Cuisine[T.tibetan,italian]                   0.0664      0.245      0.271      0.787      -0.415       0.547
City_Reducida[T.Indore]:Cuisine[T.tibetan,italian]                     -0.4901      0.446     -1.100      0.272      -1.364       0.384
City_Reducida[T.Jaipur]:Cuisine[T.tibetan,italian]                     -0.1017      0.414     -0.246      0.806      -0.913       0.709
City_Reducida[T.Kolkata]:Cuisine[T.tibetan,italian]                     0.0902      0.165      0.546      0.585      -0.234       0.414
City_Reducida[T.Lucknow]:Cuisine[T.tibetan,italian]                     0.5146      0.564      0.913      0.361      -0.591       1.620
City_Reducida[T.Mumbai]:Cuisine[T.tibetan,italian]                      0.0960      0.239      0.402      0.688      -0.372       0.564
City_Reducida[T.Mysore]:Cuisine[T.tibetan,italian]                      0.1699      0.195      0.870      0.384      -0.213       0.553
City_Reducida[T.Navi Mumbai]:Cuisine[T.tibetan,italian]                -0.1401      0.436     -0.321      0.748      -0.995       0.715
City_Reducida[T.New Delhi]:Cuisine[T.tibetan,italian]                  -0.2101      0.339     -0.619      0.536      -0.875       0.455
City_Reducida[T.Noida]:Cuisine[T.tibetan,italian]                       0.0736      0.269      0.274      0.784      -0.453       0.600
City_Reducida[T.Other]:Cuisine[T.tibetan,italian]                       0.4465      0.230      1.938      0.053      -0.005       0.898
City_Reducida[T.Pune]:Cuisine[T.tibetan,italian]                        0.1041      0.254      0.410      0.682      -0.394       0.602
City_Reducida[T.Unknown]:Cuisine[T.tibetan,italian]                    -0.1137      0.266     -0.427      0.669      -0.636       0.408
City_Reducida[T.chennai]:Cuisine[T.tibetan,italian]                     0.3716      0.594      0.626      0.532      -0.793       1.537
City_Reducida[T.pune]:Cuisine[T.tibetan,italian]                        0.1109      0.325      0.341      0.733      -0.527       0.749
City_Reducida[T.Bhubaneswar]:Cuisine[T.turkish,nigerian]                0.8577      0.680      1.261      0.207      -0.476       2.191
City_Reducida[T.Chandigarh]:Cuisine[T.turkish,nigerian]                 0.1801      0.680      0.265      0.791      -1.154       1.514
City_Reducida[T.Chennai]:Cuisine[T.turkish,nigerian]                    0.6116      0.387      1.582      0.114      -0.146       1.370
City_Reducida[T.Greater Noida]:Cuisine[T.turkish,nigerian]                   0          0        nan        nan           0           0
City_Reducida[T.Gurgaon]:Cuisine[T.turkish,nigerian]                    0.2414      0.306      0.789      0.430      -0.358       0.841
City_Reducida[T.Hyderabad]:Cuisine[T.turkish,nigerian]                  0.2803      0.267      1.051      0.293      -0.243       0.803
City_Reducida[T.Indore]:Cuisine[T.turkish,nigerian]                    -0.7667      0.447     -1.717      0.086      -1.642       0.109
City_Reducida[T.Jaipur]:Cuisine[T.turkish,nigerian]                     0.5897      0.495      1.191      0.234      -0.381       1.561
City_Reducida[T.Kolkata]:Cuisine[T.turkish,nigerian]                    0.1594      0.331      0.482      0.630      -0.489       0.808
City_Reducida[T.Lucknow]:Cuisine[T.turkish,nigerian]                         0          0        nan        nan           0           0
City_Reducida[T.Mumbai]:Cuisine[T.turkish,nigerian]                    -0.1199      0.195     -0.616      0.538      -0.502       0.262
City_Reducida[T.Mysore]:Cuisine[T.turkish,nigerian]                          0          0        nan        nan           0           0
City_Reducida[T.Navi Mumbai]:Cuisine[T.turkish,nigerian]                0.2454      0.437      0.562      0.574      -0.611       1.102
City_Reducida[T.New Delhi]:Cuisine[T.turkish,nigerian]                  0.0019      0.335      0.006      0.995      -0.656       0.660
City_Reducida[T.Noida]:Cuisine[T.turkish,nigerian]                      0.1583      0.262      0.604      0.546      -0.356       0.672
City_Reducida[T.Other]:Cuisine[T.turkish,nigerian]                      0.4131      0.232      1.780      0.075      -0.042       0.868
City_Reducida[T.Pune]:Cuisine[T.turkish,nigerian]                       0.2692      0.267      1.007      0.314      -0.255       0.794
City_Reducida[T.Unknown]:Cuisine[T.turkish,nigerian]                   -0.1699      0.267     -0.637      0.524      -0.693       0.353
City_Reducida[T.chennai]:Cuisine[T.turkish,nigerian]                   -0.3031      0.495     -0.613      0.540      -1.273       0.667
City_Reducida[T.pune]:Cuisine[T.turkish,nigerian]                            0          0        nan        nan           0           0
City_Reducida[T.Bhubaneswar]:Cuisine[T.turkish,sapnish]                      0          0        nan        nan           0           0
City_Reducida[T.Chandigarh]:Cuisine[T.turkish,sapnish]                 -0.0076      0.553     -0.014      0.989      -1.091       1.076
City_Reducida[T.Chennai]:Cuisine[T.turkish,sapnish]                     0.3439      0.411      0.838      0.402      -0.461       1.149
City_Reducida[T.Greater Noida]:Cuisine[T.turkish,sapnish]                    0          0        nan        nan           0           0
City_Reducida[T.Gurgaon]:Cuisine[T.turkish,sapnish]                     0.2565      0.334      0.767      0.443      -0.399       0.912
City_Reducida[T.Hyderabad]:Cuisine[T.turkish,sapnish]                   0.0329      0.293      0.112      0.911      -0.542       0.608
City_Reducida[T.Indore]:Cuisine[T.turkish,sapnish]                     -0.0825      0.288     -0.287      0.774      -0.647       0.481
City_Reducida[T.Jaipur]:Cuisine[T.turkish,sapnish]                      0.4671      0.599      0.779      0.436      -0.708       1.642
City_Reducida[T.Kolkata]:Cuisine[T.turkish,sapnish]                     0.4918      0.467      1.054      0.292      -0.423       1.407
City_Reducida[T.Lucknow]:Cuisine[T.turkish,sapnish]                     0.5227      0.570      0.917      0.359      -0.595       1.640
City_Reducida[T.Mumbai]:Cuisine[T.turkish,sapnish]                      0.0534      0.250      0.214      0.831      -0.437       0.543
City_Reducida[T.Mysore]:Cuisine[T.turkish,sapnish]                     -0.5524      0.451     -1.226      0.220      -1.436       0.331
City_Reducida[T.Navi Mumbai]:Cuisine[T.turkish,sapnish]                      0          0        nan        nan           0           0
City_Reducida[T.New Delhi]:Cuisine[T.turkish,sapnish]                  -0.1363      0.363     -0.375      0.708      -0.849       0.576
City_Reducida[T.Noida]:Cuisine[T.turkish,sapnish]                       0.4373      0.283      1.544      0.123      -0.118       0.993
City_Reducida[T.Other]:Cuisine[T.turkish,sapnish]                       0.3646      0.243      1.499      0.134      -0.112       0.842
City_Reducida[T.Pune]:Cuisine[T.turkish,sapnish]                        0.6259      0.316      1.983      0.047       0.007       1.245
City_Reducida[T.Unknown]:Cuisine[T.turkish,sapnish]                    -0.1824      0.294     -0.621      0.535      -0.759       0.394
City_Reducida[T.chennai]:Cuisine[T.turkish,sapnish]                    -0.4225      0.599     -0.705      0.481      -1.598       0.753
City_Reducida[T.pune]:Cuisine[T.turkish,sapnish]                        0.1266      0.334      0.379      0.704      -0.528       0.781
City_Reducida[T.Bhubaneswar]:Cuisine[T.welsh,thai]                      0.0273      0.681      0.040      0.968      -1.307       1.362
City_Reducida[T.Chandigarh]:Cuisine[T.welsh,thai]                       0.3236      0.685      0.472      0.637      -1.021       1.668
City_Reducida[T.Chennai]:Cuisine[T.welsh,thai]                          0.6579      0.384      1.713      0.087      -0.095       1.411
City_Reducida[T.Greater Noida]:Cuisine[T.welsh,thai]                   -0.2365      0.250     -0.946      0.344      -0.727       0.254
City_Reducida[T.Gurgaon]:Cuisine[T.welsh,thai]                          0.1503      0.282      0.534      0.594      -0.402       0.702
City_Reducida[T.Hyderabad]:Cuisine[T.welsh,thai]                        0.0045      0.250      0.018      0.986      -0.485       0.494
City_Reducida[T.Indore]:Cuisine[T.welsh,thai]                          -0.8708      0.447     -1.948      0.052      -1.747       0.006
City_Reducida[T.Jaipur]:Cuisine[T.welsh,thai]                           0.1406      0.416      0.338      0.735      -0.674       0.956
City_Reducida[T.Kolkata]:Cuisine[T.welsh,thai]                         -0.5258      0.179     -2.940      0.003      -0.876      -0.175
City_Reducida[T.Lucknow]:Cuisine[T.welsh,thai]                          0.2075      0.532      0.390      0.697      -0.836       1.251
City_Reducida[T.Mumbai]:Cuisine[T.welsh,thai]                          -0.0650      0.188     -0.347      0.729      -0.433       0.303
City_Reducida[T.Mysore]:Cuisine[T.welsh,thai]                          -0.2068      0.328     -0.630      0.529      -0.851       0.437
City_Reducida[T.Navi Mumbai]:Cuisine[T.welsh,thai]                     -0.0012      0.236     -0.005      0.996      -0.464       0.462
City_Reducida[T.New Delhi]:Cuisine[T.welsh,thai]                       -0.1328      0.327     -0.406      0.685      -0.774       0.508
City_Reducida[T.Noida]:Cuisine[T.welsh,thai]                            0.2206      0.265      0.833      0.405      -0.298       0.740
City_Reducida[T.Other]:Cuisine[T.welsh,thai]                            0.2230      0.228      0.979      0.328      -0.224       0.670
City_Reducida[T.Pune]:Cuisine[T.welsh,thai]                             0.0915      0.265      0.345      0.730      -0.428       0.611
City_Reducida[T.Unknown]:Cuisine[T.welsh,thai]                         -0.2861      0.261     -1.095      0.273      -0.798       0.226
City_Reducida[T.chennai]:Cuisine[T.welsh,thai]                         -0.4450      0.495     -0.898      0.369      -1.416       0.526
City_Reducida[T.pune]:Cuisine[T.welsh,thai]                            -1.1067      0.327     -3.381      0.001      -1.749      -0.465
Hygiene_Rating                                                          0.0787      0.008     10.067      0.000       0.063       0.094
Years_Open                                                             -0.0032      0.030     -0.107      0.915      -0.061       0.055
Instagram_Popularity_Quotient                                          -0.0045      0.006     -0.696      0.486      -0.017       0.008
Years_Open:Instagram_Popularity_Quotient                                0.0009      0.000      2.108      0.035       6e-05       0.002
Liquour_License_Obtained                                                4.3173      0.287     15.056      0.000       3.755       4.880
Liquour_License_Obtained:Resturant_Type[T.Buffet/Family Restaurant]     0.2760      0.236      1.167      0.243      -0.188       0.740
Liquour_License_Obtained:Resturant_Type[T.Caffee]                      -4.3093      0.289    -14.906      0.000      -4.876      -3.742
Liquour_License_Obtained:Resturant_Type[T.Gastro Bar]                  -0.0648      0.038     -1.684      0.092      -0.140       0.011
Resturant_Tier                                                         -0.4323      0.160     -2.710      0.007      -0.745      -0.120
Live_Music_Rating_missing                                              -0.0391      0.068     -0.579      0.563      -0.171       0.093
Resturant_Tier:Live_Music_Rating_missing                                0.0473      0.035      1.353      0.176      -0.021       0.116
Value_Deals_Rating_missing                                              0.0064      0.247      0.026      0.979      -0.478       0.491
Endorsed_By[T.Not Specific]:Value_Deals_Rating_missing                  0.0361      0.248      0.146      0.884      -0.450       0.523
Endorsed_By[T.Tier A Celebrity]:Value_Deals_Rating_missing             -0.0125      0.249     -0.050      0.960      -0.500       0.475
==============================================================================
Omnibus:                      218.525   Durbin-Watson:                   2.041
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              991.151
Skew:                           0.074   Prob(JB):                    5.95e-216
Kurtosis:                       5.645   Cond. No.                     7.86e+19
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 7.62e-31. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
                                              sum_sq      df           F  \
City_Reducida                             -73.025518    20.0  -16.738603   
Cuisine                                    22.470394    19.0    5.421652   
Resturant_Type                              0.745436     3.0    1.139104   
Endorsed_By                                 2.392316     2.0    5.483567   
City_Reducida:Cuisine                     116.943371   380.0    1.410804   
Hygiene_Rating                             22.107875     1.0  101.349484   
Years_Open                                 26.350417     1.0  120.798638   
Instagram_Popularity_Quotient              21.286172     1.0   97.582538   
Years_Open:Instagram_Popularity_Quotient    0.968915     1.0    4.441812   
Liquour_License_Obtained                    1.712197     1.0    7.849252   
Liquour_License_Obtained:Resturant_Type    49.129520     3.0   75.075083   
Resturant_Tier                              9.019409     1.0   41.347822   
Live_Music_Rating_missing                   5.283550     1.0   24.221463   
Resturant_Tier:Live_Music_Rating_missing    0.399265     1.0    1.830357   
Value_Deals_Rating_missing                  0.294460     1.0    1.349897   
Endorsed_By:Value_Deals_Rating_missing      0.413173     2.0    0.947058   
Residual                                  666.184436  3054.0         NaN   

                                                PR(>F)  
City_Reducida                             1.000000e+00  
Cuisine                                   2.753396e-13  
Resturant_Type                            3.202416e-01  
Endorsed_By                               4.195491e-03  
City_Reducida:Cuisine                     1.655090e-05  
Hygiene_Rating                            1.784145e-23  
Years_Open                                1.380936e-27  
Instagram_Popularity_Quotient             1.125691e-22  
Years_Open:Instagram_Popularity_Quotient  3.515032e-02  
Liquour_License_Obtained                  5.116203e-03  
Liquour_License_Obtained:Resturant_Type   7.535881e-47  
Resturant_Tier                            1.473473e-10  
Live_Music_Rating_missing                 9.042276e-07  
Resturant_Tier:Live_Music_Rating_missing  1.761856e-01  
Value_Deals_Rating_missing                2.453869e-01  
Endorsed_By:Value_Deals_Rating_missing    3.879945e-01  
Residual                                           NaN  
C:\Users\Admin\AppData\Local\Programs\Python\Python313\Lib\site-packages\statsmodels\base\model.py:1894: ValueWarning:

covariance of constraints does not have full rank. The number of constraints is 3, but rank is 2

C:\Users\Admin\AppData\Local\Programs\Python\Python313\Lib\site-packages\statsmodels\base\model.py:1894: ValueWarning:

covariance of constraints does not have full rank. The number of constraints is 380, but rank is 286

In [61]:
import statsmodels.formula.api as smf
df = data_def.rename(columns={
    'Fire Audit': 'Fire_Audit',
    'Liquor License Obtained': 'Liquor_License_Obtained',
    'Annual Turnover Log': 'Annual_Turnover_Log'
})

model = smf.ols(
    'Annual_Turnover_Log ~ Liquor_License_Obtained * Fire_Audit',
    data=df
).fit()

print(model.summary())
                             OLS Regression Results                            
===============================================================================
Dep. Variable:     Annual_Turnover_Log   R-squared:                       0.003
Model:                             OLS   Adj. R-squared:                  0.002
Method:                  Least Squares   F-statistic:                     3.047
Date:                 Fri, 01 Aug 2025   Prob (F-statistic):             0.0276
Time:                         16:48:20   Log-Likelihood:                -2844.6
No. Observations:                 3493   AIC:                             5697.
Df Residuals:                     3489   BIC:                             5722.
Df Model:                            3                                         
Covariance Type:             nonrobust                                         
======================================================================================================
                                         coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------------------------
Intercept                             16.9987      0.165    103.140      0.000      16.676      17.322
Liquor_License_Obtained                0.0572      0.166      0.345      0.730      -0.268       0.383
Fire_Audit                            -0.1379      0.193     -0.716      0.474      -0.516       0.240
Liquor_License_Obtained:Fire_Audit     0.1813      0.194      0.934      0.350      -0.199       0.562
==============================================================================
Omnibus:                       77.063   Durbin-Watson:                   1.984
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              158.643
Skew:                          -0.096   Prob(JB):                     3.56e-35
Kurtosis:                       4.026   Cond. No.                         68.6
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [62]:
import statsmodels.formula.api as smf
df = data_def.rename(columns={
    'Situated in a Multi Complex': 'Situated_in_a_Multi_Complex',
    'Dedicated Parking': 'Dedicated_Parking',
    'Annual Turnover Log': 'Annual_Turnover_Log'
})

model = smf.ols(
    'Annual_Turnover_Log ~ Situated_in_a_Multi_Complex * Dedicated_Parking',
    data=df
).fit()

print(model.summary())
                             OLS Regression Results                            
===============================================================================
Dep. Variable:     Annual_Turnover_Log   R-squared:                       0.002
Model:                             OLS   Adj. R-squared:                  0.002
Method:                  Least Squares   F-statistic:                     2.884
Date:                 Fri, 01 Aug 2025   Prob (F-statistic):             0.0344
Time:                         16:49:35   Log-Likelihood:                -2844.8
No. Observations:                 3493   AIC:                             5698.
Df Residuals:                     3489   BIC:                             5722.
Df Model:                            3                                         
Covariance Type:             nonrobust                                         
=================================================================================================================
                                                    coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------------------------------------
Intercept                                        17.0638      0.047    362.683      0.000      16.972      17.156
Situated_in_a_Multi_Complex                      -0.0186      0.052     -0.355      0.723      -0.121       0.084
Dedicated_Parking                                 0.0769      0.053      1.460      0.144      -0.026       0.180
Situated_in_a_Multi_Complex:Dedicated_Parking    -0.0347      0.059     -0.592      0.554      -0.150       0.080
==============================================================================
Omnibus:                       72.968   Durbin-Watson:                   1.983
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              145.344
Skew:                          -0.100   Prob(JB):                     2.75e-32
Kurtosis:                       3.979   Cond. No.                         18.6
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [63]:
import statsmodels.formula.api as smf
df = data_def.rename(columns={
    'Food Rating': 'Food_Rating',
    'Service': 'Service',
    'Annual Turnover Log': 'Annual_Turnover_Log'
})

model = smf.ols(
    'Annual_Turnover_Log ~ Food_Rating * Service',
    data=df
).fit()

print(model.summary())
                             OLS Regression Results                            
===============================================================================
Dep. Variable:     Annual_Turnover_Log   R-squared:                       0.000
Model:                             OLS   Adj. R-squared:                 -0.001
Method:                  Least Squares   F-statistic:                    0.3113
Date:                 Fri, 01 Aug 2025   Prob (F-statistic):              0.817
Time:                         16:49:53   Log-Likelihood:                -2848.7
No. Observations:                 3493   AIC:                             5705.
Df Residuals:                     3489   BIC:                             5730.
Df Model:                            3                                         
Covariance Type:             nonrobust                                         
=======================================================================================
                          coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------
Intercept              17.1741      0.109    157.326      0.000      16.960      17.388
Food_Rating            -0.0096      0.014     -0.679      0.497      -0.037       0.018
Service                -0.0192      0.022     -0.862      0.389      -0.063       0.024
Food_Rating:Service     0.0021      0.003      0.744      0.457      -0.003       0.008
==============================================================================
Omnibus:                       77.061   Durbin-Watson:                   1.982
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              158.033
Skew:                          -0.099   Prob(JB):                     4.83e-35
Kurtosis:                       4.023   Cond. No.                         473.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [41]:
def categorizar_valor(x):
    if x in [1, 2, 3]:
        return 'Bajo'
    elif x in [4, 5]:
        return 'Medio'
    elif x in [6, 7, 8]:
        return 'Alto'
    else:
        return None
In [42]:
data_def['Liquor License Obtained Cat'] = data_def['Liquor License Obtained'].map({1: 'Si', 0: 'No'})
data_def['Value Deals Rating_missing Cat'] = data_def['Value Deals Rating_missing'].apply(categorizar_valor)
In [43]:
data_def['city_cuisine']=data_def['City_Reducida']+"_"+data_def['Cuisine']
data_def['popularity_years']=data_def['Instagram Popularity Quotient']*data_def['Years_Open']
data_def['tier_music']=data_def['Resturant Tier']*data_def['Live Music Rating_missing']
data_def['liquor_type']=data_def['Liquor License Obtained Cat']+"_"+data_def['Restaurant Type']
data_def['values_endorsed']=data_def['Endorsed By']+"_"+data_def['Value Deals Rating_missing Cat']
In [44]:
data_def = data_def[['Annual Turnover Log','Hygiene Rating','City_Reducida','Cuisine','city_cuisine','Years_Open','Instagram Popularity Quotient','popularity_years','Liquor License Obtained Cat','Restaurant Type','liquor_type','Resturant Tier','Live Music Rating_missing','tier_music','Endorsed By','Value Deals Rating_missing Cat','values_endorsed']]
In [45]:
data_def['Instagram Popularity Quotient'] = data_def['Instagram Popularity Quotient'].fillna(data_def['Instagram Popularity Quotient'].median())
data_def['popularity_years'] = data_def['popularity_years'].fillna(data_def['popularity_years'].median())
data_def['Resturant Tier'] = data_def['Resturant Tier'].fillna(data_def['Resturant Tier'].median())
data_def['tier_music'] = data_def['tier_music'].fillna(data_def['tier_music'].median())

Se realiza un One Hot simplemnte para transformar las variables categóricas en variables continuas y luego se combinan de nuevo todas en el dataset.

In [46]:
catCol = data_def.select_dtypes(include= ['object'] ).columns.to_list()
numCol = data_def.select_dtypes(include= ['float64','int64'] ).columns.to_list()

preprocesador = ColumnTransformer([('onehot', OneHotEncoder(handle_unknown = 'ignore'),catCol)],remainder = 'passthrough')


datospre =preprocesador.fit_transform(data_def)
In [47]:
codCat=preprocesador.named_transformers_['onehot'].get_feature_names_out(catCol)
labels = np.concatenate([codCat,numCol])
In [48]:
datosProc = pd.DataFrame(datospre.toarray(), columns=labels)

En la gráfica de la distribución de la variable Annual Turnover, se puede evidenciar que hay puntos o posibles outliers en su cola, es decir, al final de la distribución así que se decide buscarlos por medio del método del rango intercuartílico, pero esto se realiza sobre la transformación logaritmica de la variable Annual Turnover, debido a que esta si es lineal.

In [49]:
df_med=datosProc
Q1 = df_med['Annual Turnover Log'].quantile(0.25)
Q3 = df_med['Annual Turnover Log'].quantile(0.75)
IQR = Q3 - Q1

lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

filtered_data = df_med[(df_med['Annual Turnover Log'] >= lower_bound) & (df_med['Annual Turnover Log'] <= upper_bound)]

print(f"Filtrados: {filtered_data.shape[0]} filas (sin outliers)")
Filtrados: 3436 filas (sin outliers)

Se crea el término de interacción entre Tipo de restaurante y la locación del restaurante. Ya se verificó nateriormente este término.

Se realiza el split de X y Y, y se utiliza train_test_split para sacar los sets de train (Entrenamiento) y test (Prueba) por emdio de la librería train_test_split con el 70% del dataset para entrenar y un 30% para el test.

In [50]:
x=filtered_data.drop(columns=['Annual Turnover Log'])
y=filtered_data[['Annual Turnover Log']]
In [51]:
xEntrenamiento, xPrueba, yEntrenamiento, yPrueba = train_test_split(x,y,train_size=0.7, random_state=123)

Para la búsqueda de hiperparámetros se usa GridSearch, en el se esta evaluando tres métricas principales el error cuadrático medio (es el promedio de los cuadrados de las diferencias entre las predicciones y los valores reales.), el error absoluto medio (es el promedio de las diferencias absolutas entre las predicciones y los valores reales.) y el R2 (mide la proporción de la varianza de la variable objetivo que el modelo logra explicar). Finalmente, usamos Kfold para realizar la validación cruzada, esto básicamente divide los datos en K chunks o folds, cada iteración utiliza un fold como test y el k-1 como entrenamiento y promedia las métricas definidas.

In [52]:
from sklearn.model_selection import GridSearchCV, KFold
from sklearn.metrics import (
    mean_squared_error,
    mean_absolute_error,
    r2_score,
    mean_absolute_percentage_error
)
import pandas as pd

def run_grid_search(
    X_train, y_train,
    X_test=None, y_test=None,
    estimator=None,
    param_grid=None,
    cv_splits=5,
    metrics=None,
    random_state=42,
    n_jobs=-1,
    verbose=0
):
    if metrics is None:
        metrics = [
            'neg_mean_squared_error',
            'neg_mean_absolute_error',
            'r2',
            'neg_mean_absolute_percentage_error'
        ]

    scoring = {m: m for m in metrics}
    refit_metric = metrics[0]

    inner_cv = KFold(n_splits=cv_splits, shuffle=True, random_state=random_state)

    grid = GridSearchCV(
        estimator=estimator,
        param_grid=param_grid,
        scoring=scoring,
        cv=inner_cv,
        refit=refit_metric,
        return_train_score=False,
        n_jobs=n_jobs,
        verbose=verbose
    )
    grid.fit(X_train, y_train)

    # Convertimos cv_results_ en DataFrame y "desnegativizamos"
    df = pd.DataFrame(grid.cv_results_)
    for m in metrics:
        col = f"mean_test_{m}"
        # Si era mape (negativo) o any neg_, lo revertimos y renombramos
        if m.startswith('neg_'):
            name = m.replace('neg_', '')
            df[f"mean_test_{name}"] = -df[col]
        else:
            df[f"mean_test_{m}"] = df[col]

    results = {
        'best_estimator': grid.best_estimator_,
        'best_params':    grid.best_params_,
        'cv_results_df':  df
    }

    # Predicciones en train
    y_train_pred = grid.best_estimator_.predict(X_train)
    results['train_scores'] = {
        'mse': mean_squared_error(y_train, y_train_pred),
        'mae': mean_absolute_error(y_train, y_train_pred),
        'r2':  r2_score(y_train, y_train_pred),
        'mape': mean_absolute_percentage_error(y_train, y_train_pred)
    }

    # Predicciones en test (si se provee)
    if X_test is not None and y_test is not None:
        y_test_pred = grid.best_estimator_.predict(X_test)
        results['test_scores'] = {
            'mse': mean_squared_error(y_test, y_test_pred),
            'mae': mean_absolute_error(y_test, y_test_pred),
            'r2':  r2_score(y_test, y_test_pred),
            'mape': mean_absolute_percentage_error(y_test, y_test_pred)
        }

    return results
In [53]:
ridge = Ridge()
param_grid = {
    'alpha': [0.01, 0.1, 1.0, 10.0, 100.0],
    'fit_intercept': [True, False]
}


results = run_grid_search(
    xEntrenamiento, yEntrenamiento,
    xPrueba, yPrueba,
    estimator=ridge,
    param_grid=param_grid,
    cv_splits=5,
    metrics=['neg_mean_squared_error', 'neg_mean_absolute_error', 'r2','neg_mean_absolute_percentage_error'],
    verbose=1
)

print("Mejores parámetros:", results['best_params'])
print("Métricas en entrenamiento:", results['train_scores'])
print("Métricas en prueba:", results['test_scores'])
Fitting 5 folds for each of 10 candidates, totalling 50 fits
Mejores parámetros: {'alpha': 100.0, 'fit_intercept': True}
Métricas en entrenamiento: {'mse': 0.17636010747782754, 'mae': 0.3305633488317981, 'r2': 0.2996276389933501, 'mape': 0.019476920294949726}
Métricas en prueba: {'mse': 0.18057752081411288, 'mae': 0.3390956918508284, 'r2': 0.27037822479784723, 'mape': 0.019929175499234633}

El hecho de que las métricas en test sean muy cercanas a las de entrenamiento indica que no hay sobreajuste notable y que el modelo generaliza bien con la regularización aplicada. Un R2 de alrededor de 0.30 significa que el modelo está explicando aproximadamente el 30 % de la variabilidad en la facturación anual con las características escogidas. El MAE de 0.33 sobre valores típicos de nuestra varibale factuaración anual transformada equivale apenas a un 2 % de error medio, lo cual es muy razonable. El MAPE es de 1.9% en entrenamiento y 1.9% en test, este error es abstante bajo por lo que apesar de que el R2 es de 30%, el error promedio absoluto porcentual es bastante bajo.

In [56]:
lr = LinearRegression()
param_grid_lr = {
    'fit_intercept': [True, False],
    'copy_X':        [True, False],
    'positive':      [False, True], 
    'tol':           [1e-6, 1e-4, 1e-2] 
}
results_lr = run_grid_search(
    xEntrenamiento, yEntrenamiento,
    X_test=xPrueba, y_test=yPrueba,
    estimator=lr,
    param_grid=param_grid_lr,
    cv_splits=5,
    metrics=['neg_mean_squared_error','neg_mean_absolute_error','r2','neg_mean_absolute_percentage_error'],
    verbose=1
)
print("Mejores parámetros LR:", results_lr['best_params'])
print("Train scores:", results_lr['train_scores'])
print("Test  scores:", results_lr['test_scores'])
Fitting 5 folds for each of 24 candidates, totalling 120 fits
Mejores parámetros LR: {'copy_X': True, 'fit_intercept': True, 'positive': False, 'tol': 1e-06}
Train scores: {'mse': 0.15197310863979677, 'mae': 0.3003089325411634, 'r2': 0.39647482398503175, 'mape': 0.01766871584954887}
Test  scores: {'mse': 0.20276154539115934, 'mae': 0.3535110486610771, 'r2': 0.18074388205097414, 'mape': 0.02077405901428926}

Las métricas de entrenamiento y test son bastantes distantes entre si lo que indica que este modelo tiene un sobreajuste, en entrenamiento se obtiene un R2 de 40%, sin embargo, al momento del test esta métrica disminuyé notoriamente a 18.83%. Este sobreajuste se puede originar debido a que el no hay datos suficientes para modelar la regresión. Como se evidenció en features como city habían ciudades que solo contaban con un representante, es decir, tenian freceuncia de 1, a lo largo del ejercicio se trataron de reducir esta inestabilidades, sin embargo, la regresión lineal no es capaz de generalizar los patrones de los datos.

In [54]:
xgb_model = xgb.XGBRegressor(
    objective='reg:squarederror',
    random_state=42,
    n_jobs=-1,
    verbosity=0
)

param_grid_xgb = {
    'n_estimators':      [100, 300],
    'max_depth':         [3, 5, 7],
    'learning_rate':     [0.01, 0.05, 0.1],
    'reg_alpha':         [0.1, 1.0],
    'reg_lambda':        [1.0, 5.0]
}

results_xgb = run_grid_search(
    xEntrenamiento, yEntrenamiento,
    X_test=xPrueba, y_test=yPrueba,
    estimator=xgb_model,
    param_grid=param_grid_xgb,
    cv_splits=5,
    metrics=['neg_mean_squared_error','neg_mean_absolute_error','r2','neg_mean_absolute_percentage_error'],
    verbose=1
)

print("Mejores parámetros XGB:", results_xgb['best_params'])
print("Train scores:", results_xgb['train_scores'])
print("Test  scores:", results_xgb['test_scores'])
Fitting 5 folds for each of 72 candidates, totalling 360 fits
Mejores parámetros XGB: {'learning_rate': 0.05, 'max_depth': 3, 'n_estimators': 300, 'reg_alpha': 1.0, 'reg_lambda': 5.0}
Train scores: {'mse': 0.15002329647541046, 'mae': 0.30378881096839905, 'r2': 0.40421807765960693, 'mape': 0.017894936725497246}
Test  scores: {'mse': 0.17858749628067017, 'mae': 0.3348684012889862, 'r2': 0.27841895818710327, 'mape': 0.019669145345687866}

Mi configuración de XGBRegressor prioriza un aprendizaje gradual y controlado usa árboles poco profundos que aportan interacciones de bajo orden y, al ensamblarlos en gran número, disminuye la varianza y evita el sobreajuste. La tasa de aprendizaje moderada obliga a tener más épocas o iteraciones para afinar el modelo, mientras que la regularización L1 le ayuda a descartar ganancias irrelevantes y la L2 estabiliza los pesos, obteniendo así un modelo robusto que captura las no linealidades sin memorizar el ruido. Los resultados son bastantes parecidos a la regresión con Ridge, este es un modelo que actua muy bien cuando hay no linealidades en los datos y como la distribución de nuestra variable Factuarción anual es no lineal, sería un excelente modelo a aplicar. Sin embargo, le hace falta algo de ajuste pues el R2 en entrenamiento y prueba no hay un sobre ajuste claro, pero podría mejorar con un set de datos mejor.

In [58]:
ada_model = AdaBoostRegressor(
    random_state=42
)
param_grid_ada = {
    'n_estimators':      [50, 100],
    'learning_rate':     [0.01, 0.1, 0.5, 1.0],
    'loss':              ['linear', 'square', 'exponential']
}
results_ada = run_grid_search(
    xEntrenamiento, yEntrenamiento,
    X_test=xPrueba, y_test=yPrueba,
    estimator=ada_model,
    param_grid=param_grid_ada,
    cv_splits=5,
    metrics=['neg_mean_squared_error','neg_mean_absolute_error','r2','neg_mean_absolute_percentage_error'],
    verbose=1
)

print("Mejores parámetros AdaBoost:", results_ada['best_params'])
print("Train scores:", results_ada['train_scores'])
print("Test  scores:", results_ada['test_scores'])
Fitting 5 folds for each of 24 candidates, totalling 120 fits
Mejores parámetros AdaBoost: {'learning_rate': 0.1, 'loss': 'linear', 'n_estimators': 50}
Train scores: {'mse': 0.19472923541976003, 'mae': 0.35479893564680326, 'r2': 0.22667900174021194, 'mape': 0.020874243888165972}
Test  scores: {'mse': 0.19962382022404196, 'mae': 0.3618164819170651, 'r2': 0.1934218310902973, 'mape': 0.021224991608803208}

Mi AdaBoostRegressor optimizado con learning_rate=0.1, loss='exponencial' y n_estimators=100 obtuvo en entrenamiento un MSE de 0.1952 y un R2 de 0.22, mientras que en test escaló ligeramente a un MSE de 0.20 y un R2 de 0.18. Los resultados de test y de train son bastantes semejantes por lo que indica que no hay sobreajuste de hecho con una tasa de aprendizaje tan pequeña hay una perdida lineal alta y el modelo no logra generalizar. A pesar, de demostrar ser un modelo bastante estable tiene un performance menor al modelo de ridge y al xgboost

In [59]:
rf_model = RandomForestRegressor(random_state=42)

param_grid_rf = {
    'n_estimators':      [100, 200, 500],
    'max_depth':         [None, 10, 20],
    'min_samples_split': [2, 5, 10],
    'max_features':      ['auto', 'sqrt']
}

results_rf = run_grid_search(
    xEntrenamiento, yEntrenamiento,
    X_test=xPrueba, y_test=yPrueba,
    estimator=rf_model,
    param_grid=param_grid_rf,
    cv_splits=5,
    metrics=[
        'neg_mean_squared_error',
        'neg_mean_absolute_error',
        'r2',
        'neg_mean_absolute_percentage_error'
    ],
    verbose=1
)

print("Mejores parámetros Random Forest:", results_rf['best_params'])
print("Train scores:", results_rf['train_scores'])
print("Test  scores:", results_rf['test_scores'])
Fitting 5 folds for each of 54 candidates, totalling 270 fits
Mejores parámetros Random Forest: {'max_depth': None, 'max_features': 'sqrt', 'min_samples_split': 10, 'n_estimators': 500}
Train scores: {'mse': 0.08513670294764214, 'mae': 0.2286458507790223, 'r2': 0.6618997657434607, 'mape': 0.013473102652427574}
Test  scores: {'mse': 0.18120873630053536, 'mae': 0.3341117541575857, 'r2': 0.2678278045591468, 'mape': 0.019640572424019304}

Hay un sobreajuste claro en el modelo, durante el entrenamiento el R2 fue de 66%, mientras que en el set de prueba se obtuvo un 26% esto demuratras un claro sobreajuste en el modelo. Los Random Forest sufren de esto al recibir el error dedl arbol anterior lo magnífican.

Teniendo este panorama solo podemos escoger estre dos modelos Ridge Estimaro y el XGBoost, vamos a verificar la Linealidad, Homoscedasticidad, Normalidad de los residuos

Ridge Estimator¶

In [55]:
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.stats.diagnostic import het_breuschpagan
import statsmodels.api as sm

y_true_orig = pd.Series(
   yPrueba.iloc[:, 0],
    index=yPrueba.index
)

y_pred = results['best_estimator'].predict(xPrueba)
y_pred_orig = pd.Series(
   y_pred,  
    index=yPrueba.index
)

resid_orig = y_true_orig - y_pred_orig

plt.figure(figsize=(8,5))
plt.scatter(y_pred_orig, resid_orig, alpha=0.6)
plt.axhline(0, color='red', linewidth=1.5, linestyle='--')
plt.xlabel('Valores predichos')
plt.ylabel('Residuos')
plt.title('Residuos vs Predicciones')
plt.tight_layout()
plt.show()
No description has been provided for this image
In [56]:
plt.figure(figsize=(8, 4))
plt.boxplot(resid_orig, vert=False, patch_artist=True,
            boxprops=dict(facecolor='lightgray', color='black'),
            medianprops=dict(color='red'),
            flierprops=dict(marker='o', markerfacecolor='black', markersize=5))
plt.title('Evaluate the distribution of residuals for Normality')
plt.xlabel('Residuales')
plt.yticks([])
plt.show()
No description has been provided for this image
In [57]:
import statsmodels.api as sm

resid = resid_orig
plt.figure(figsize=(6,6))
sm.qqplot(resid, line='45', fit=True, alpha=0.6)
plt.title("QQ-plot de residuos")
plt.tight_layout()
plt.show()
<Figure size 600x600 with 0 Axes>
No description has been provided for this image

Vemos que el modelo Ridge cumple con todos los supuestos de linealidad, cumple con la Homoscedasticidad, normalidad de los residuos y la linealidad de los mismos. Además, tiene desempeños muy similares en entrenamiento y prueba, por lo que no hay overfitting, finalmente, MAPE ≈ 2% tien una excelente precisión relativa.

In [58]:
y_pred = results['best_estimator'].predict(xPrueba)
In [59]:
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(10, 6))

sns.scatterplot(
    x=range(len(yPrueba)), 
    y=yPrueba['Annual Turnover Log'], 
    label='Real', 
    color='blue', 
    alpha=0.5
)

sns.scatterplot(
    x=range(len(y_pred)), 
    y=y_pred.ravel(), 
    label='Predicho (Ridge)', 
    color='orange', 
    alpha=0.6
)

plt.title("Comparación de puntos: Turnover real vs predicho")
plt.xlabel("Índice")
plt.ylabel("Annual Turnover Log")
plt.legend()
plt.show()
No description has been provided for this image

Por otro lado del gráfico se puede deducir que en Ridge tiene problemas prediciendo valores extremos, también se evidencia que los valores se centran en la media de los valores reales por lo cual el modelo Ridge logra captar la tendencia central general.

XGBoost Estimator¶

In [60]:
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.stats.diagnostic import het_breuschpagan
import statsmodels.api as sm

y_true_orig = pd.Series(
   yPrueba.iloc[:, 0],
    index=yPrueba.index
)

y_pred = results_xgb['best_estimator'].predict(xPrueba)
y_pred_orig = pd.Series(
   y_pred,  
    index=yPrueba.index
)

resid_orig = y_true_orig - y_pred_orig

plt.figure(figsize=(8,5))
plt.scatter(y_pred_orig, resid_orig, alpha=0.6)
plt.axhline(0, color='red', linewidth=1.5, linestyle='--')
plt.xlabel('Valores predichos')
plt.ylabel('Residuos')
plt.title('Residuos vs Predicciones')
plt.tight_layout()
plt.show()
No description has been provided for this image
In [61]:
plt.figure(figsize=(8, 4))
plt.boxplot(resid_orig, vert=False, patch_artist=True,
            boxprops=dict(facecolor='lightgray', color='black'),
            medianprops=dict(color='red'),
            flierprops=dict(marker='o', markerfacecolor='black', markersize=5))
plt.title('Evaluate the distribution of residuals for Normality')
plt.xlabel('Residuales')
plt.yticks([])
plt.show()
No description has been provided for this image
In [62]:
import statsmodels.api as sm

resid = resid_orig

plt.figure(figsize=(6,6))
sm.qqplot(resid, line='45', fit=True, alpha=0.6)
plt.title("QQ-plot de residuos")
plt.tight_layout()
plt.show()
<Figure size 600x600 with 0 Axes>
No description has been provided for this image
In [63]:
y_pred = results_xgb['best_estimator'].predict(xPrueba)
In [64]:
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(10, 6))

sns.scatterplot(
    x=range(len(yPrueba)), 
    y=yPrueba['Annual Turnover Log'], 
    label='Real', 
    color='blue', 
    alpha=0.5
)

sns.scatterplot(
    x=range(len(y_pred)), 
    y=y_pred.ravel(), 
    label='Predicho (XGBoost)', 
    color='orange', 
    alpha=0.6
)

plt.title("Comparación de puntos: Turnover real vs predicho")
plt.xlabel("Índice")
plt.ylabel("Annual Turnover Log")
plt.legend()
plt.show()
No description has been provided for this image

El modelo XGBoost cumple también con todos los criterios de linealidad queda demostrado en las gráficas. Por otro lado, las métricas del xgboost son mejores levemente que las de Ridge, sin embargo el R2 de entrenamiento fue de 40.42%, contra el 30% de Ridge. En prueba si baja un poco a casi 30%, pero puede ser por el split del set de prueba, recordemos también que hicimos algunos cambios en las variables para reducir dimensionalidad, además ed hacer ingeniería de features, también hay unas features muy desbalanceadas como City que había ciudades con un solo representante, así que para mejorar el test podríamos mejorar la toma de datos.

El mejor modelo depende de lo que se quiera, por un lado XGBoost es el que tiene mejor R2 es decir captura mejor la variabilidad de la facturación anual con un 40%. Por otro lado, modelos como ridge tienen un R2 menor pero puede ser un modelo más estable debido a que es un modelo más ajustado, pues tanto el desempeño como entrenamiento es bastante similar. La elección del modelo depende del objetivo final del análisis. Si se busca la mayor precisión predictiva posible, XGBoost es superior. Sin embargo, si el objetivo es comprender de forma más clara cómo afectan las variables independientes a la facturación anual, un modelo lineal como Ridge puede ser preferible debido a su interpretabilidad y menor complejidad. Para mi caso voy a priorizar predictibilidad así que eligiré XGBoost como el mejor modelo.

Modelo de regresión lineal:

Annual_Turnover_Log ~ Hygiene_Rating
+ City_Reducida
+ Cuisine
+ City_Reducida × Cuisine
+ Years_Open
+ Instagram_Popularity_Quotient
+ Years_Open × Instagram_Popularity_Quotient
+ Liquour_License_Obtained
+ Resturant_Type
+ Liquour_License_Obtained × Resturant_Type
+ Resturant_Tier
+ Live_Music_Rating_missing
+ Resturant_Tier × Live_Music_Rating_missing
+ Endorsed_By
+ Value_Deals_Rating_missing
+ Endorsed_By × Value_Deals_Rating_missing

Interacciones:

  • City_Reducida * Cuisine: Evalúa si el impacto del tipo de cocina en la facturación cambia según la ciudad.
  • Years_Open * Instagram_Popularity_Quotient: Examina si la antigüedad del restaurante modifica el efecto de su popularidad en redes sociales.
  • Liquour_License_Obtained * Resturant_Type: Determina si tener licencia de licor impacta de forma distinta según el tipo de restaurante.
  • Resturant_Tier * Live_Music_Rating: Analiza si la calificación de música en vivo tiene un efecto diferente según el nivel del restaurante.
  • Endorsed_By * Value_Deals_Rating: Evalua que tipo de promociones ofrecen los restaurantes dependiendo del endorzamiento que tengan.